freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
552 stars 151 forks source link

4489 Block V3 API usage for new and anonymous users. #4579

Closed albertisfu closed 3 weeks ago

albertisfu commented 1 month ago

As outlined in #4489, this PR introduces a permission class to block anonymous users and new users from accessing V3 of the API.

Additionally, this PR introduces stats logging in Redis for V4, so once merged, the logs will be independent from V3, using v4 as a prefix. The stats logged will be exactly the same as for V3.

I also noticed that the Stat API Events, which log total API request milestones and API user milestones, are tied to API stats in Redis. As a result, when we start logging V4 stats in Redis, we will have duplicate milestones since V4 is starting from scratch.

To differentiate events between V3 and V4, I tweaked the description so future milestones for V3 are logged as:

For V4:

Does this seem good to you?

  1. If the request is not from the V3 API or BLOCK_NEW_V3_USERS is disabled, the request is allowed.
  2. If the request is from an AnonymousUser, it is blocked, returning a 403 status code with this message:Anonymous users don't have permission to access V3 of the API. Please use V4 instead.
  3. Then, we check if the user is in the v3-blocked-users-list, a list where "new" users are added so that all their future requests are blocked.
  4. Lastly, it checks only some requests to identify new users by checking if they are present in the v3-users-list. If the user is not found there, the request is blocked, and the user is added to the v3-blocked-users-list. The response is a 403 status code with this message:As a new user, you don't have permission to access V3 of the API. Please use V4 instead.

If we want this to be stateless, we can just take a mod of unixtime, and go with that.

So I implemented this method:

    def check_request() -> bool:
        if int(time.time()) % 10 == 0:
            return True
        return False

It checks requests every 10 seconds (we can adjust this value as needed), meaning all requests happening within one second every 10 seconds will be checked. Does this sound good to you? An alternative is to use a random function to check about 1 in 50 requests, but that wouldn't be fully stateless.

from cl.lib.redis_utils import get_redis_interface

r = get_redis_interface("STATS")
v3_stats_members = r.zrange("api:v3.user.counts", 0, -1)
r.sadd("v3-users-list", *v3_stats_members)

This command will copy all the current users in api:v3.user.counts to the v3-users-list.

Finally, I added the new V3APIPermission class to all API endpoints, except for:

I also didn't add the V3APIPermission to the following RECAP endpoints, which currently use the following permission classes:

My thinking is that these endpoints are mostly for internal use with the RECAPExtension and RECAPEmail, so it's probably not worth performing the V3APIPermission checks here.

For all the other API endpoints that already had a permission class, the new V3APIPermission was added at the end of the list so that it prioritizes other permissions as before.

Let me know what you think.

mlissner commented 1 month ago

An alternative is to use a random function to check about 1 in 50 requests,

Since this will have some performance impact, a random number does seem better. Otherwise, every X seconds all API requests will be impacted.

The rest sounds right to me. Thank you Alberto!

albertisfu commented 1 month ago

Since this will have some performance impact, a random number does seem better. Otherwise, every X seconds all API requests will be impacted.

Yeah, that's the downside of the time approach. I've changed it to the random approach. So ~1 in 50 are checked.

    def check_request() -> bool:
        # Check 1 in 50 requests.
        if random.randint(1, 50) == 1:
            return True
        return False
ERosendo commented 3 weeks ago

@mlissner Just a heads up: Before we merge this PR, let's run the following script:

from cl.lib.redis_utils import get_redis_interface

r = get_redis_interface("STATS")
v3_stats_members = r.zrange("api:v3.user.counts", 0, -1)
r.sadd("v3-users-list", *v3_stats_members)

Once that's done, we can safely flip the BLOCK_NEW_V3_USERS flag to true.

mlissner commented 3 weeks ago

Wonderful. Thank you both. I ran the code and set the variable to True before merging, so I think we'll be good to go once deployed.

💀 API V3 💀