freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
547 stars 150 forks source link

Make ratelimiters async where needed #2972

Open mlissner opened 1 year ago

mlissner commented 1 year ago

We use django-ratelimit to ratelimit our views, but when we made a bunch of our views async, the decorator failed and we we had to remove it temporarily to make things work again (see: https://github.com/freelawproject/courtlistener/pull/2935/). It's important that we make it work again for three reasons:

  1. Some views are scraped horribly, and it's a way of going around our APIs that we try to prevent (like docket views).
  2. Some views are terribly slow, and we need to keep crawlers and bots from hitting them too hard (like sitemaps).
  3. Some views are vulnerable to brute force attacks (like our password forms).

Making these all async is great, but we need to continue protecting them.

I opened a new issue here, asking for async support in django-ratelimit:

https://github.com/jsocol/django-ratelimit/issues/293

It has gotten some activity, but I haven't kept up.

Issue COURTLISTENER-4FT is an example of what currently happens when a ratelimiter is triggered on an async view:

ValueError: The view cl.sitemap.cached_sitemap didn't return an HttpResponse object. It returned an unawaited coroutine instead. You may need to add an 'await' into your view.
  File "asgiref/sync.py", line 486, in thread_handler
    raise exc_info[1]
  File "django/core/handlers/exception.py", line 42, in inner
    response = await get_response(request)
  File "django/core/handlers/base.py", line 265, in _get_response_async
    self.check_response(response, callback)
  File "django/core/handlers/base.py", line 337, in check_response
    raise ValueError(
mlissner commented 1 year ago

Oh, and this ratelimiter seems to support async already, but I don't know how good it is: https://pypi.org/project/django-fast-ratelimit/

If it's as good or better than django-ratelimit (which I kind of doubt?), then we could switch to it completely. It'd be sad though, to have two ratelimiting packages around forever.

I also noticed that one of our ratelimiters is busted because it doesn't get the IP address from the headers properly:

https://github.com/freelawproject/courtlistener/blob/36a93b0955b47b72723a24093f8971351f0323bb/cl/lib/ratelimiter.py#L161

That line should be tweaked to use the CloudFront-Viewer-Address header instead (docs here). This is probably a good helper for that:

https://github.com/freelawproject/courtlistener/blob/36a93b0955b47b72723a24093f8971351f0323bb/cl/lib/ratelimiter.py#L16-L36