freelawproject / courtlistener

A fully-searchable and accessible archive of court data including growing repositories of opinions, oral arguments, judges, judicial financial records, and federal filings.
https://www.courtlistener.com
Other
547 stars 150 forks source link

Block new usages of API v3 #4489

Closed mlissner closed 1 week ago

mlissner commented 1 month ago

Now that v4 is finally launched, we should make it so new people can't use v3.

I can't think of any particularly good ways to do that since ~whatever we did would have to be run on every darned request going forward.~

I take that back! We could run it on 1/50 requests, say, and that'd be fine too. We'd catch new users eventually (but annoyingly).

But I still can't think of a great way to do this. I guess it could come as a throttle class or permission on the API, maybe we pair it with a new boolean in the UserProfile model or something similar in Redis, for speed?

mlissner commented 1 month ago

Alberto, putting this on your backlog for brainstorming only. :)

albertisfu commented 1 month ago

Yeah, I agree with running this check on 1 out of every 50 requests. I believe the best way to implement this is by defining a permission_class that can be added to all API endpoints. We would then perform the has_permission check on 1/50 requests if the request comes from "V3"

It's also a good idea to cache the user status in Redis to make the check faster.

If we detect that the user is new and using V3, we can raise a PermissionDenied exception.

I think we have a few questions to address:

mlissner commented 1 month ago

We would then perform the has_permission check on 1/50 requests if the request comes from "V3"

If we want this to be stateless, we can just take a mod of unixtime, and go with that.

How will a user be defined as "new"?

I think new means "they've never used the API before"?

Should we block anonymous requests entirely for V3?

Seems like a good idea to me. We'll want to update the documentation when we do.

mlissner commented 1 month ago

If we do the above, Alberto, what's your hourly estimate for this?

albertisfu commented 1 month ago

Got it! I estimate that we can do this in one day.

mlissner commented 1 month ago

Great. So to be clear, we'd have two lists of users, right? One that has all the existing users, so we can check against that, and a second that we slowly add users to that are blocked from using v3?

albertisfu commented 1 month ago

Great. So to be clear, we'd have two lists of users, right? One that has all the existing users, so we can check against that, and a second that we slowly add users to that are blocked from using v3?

A question, do we want to keep track of users blocked from V3? I was thinking of using the sorted set api:v3.user.counts to check if a user is a V3 user. If the user is within this set, we allow the request. If they are not, we block the request. Checking for existence within this set would be O(1) since it uses a hash to keep track of keys in the set.

Unless we want to keep track of users who have been blocked from V3, we don’t need an additional list. Also, we’ll need to start logging V4 stats in Redis using the V4 prefix, so we can have separate stats. We can then use api:v3.user.counts to reliably check for existence; otherwise, it would include V4 users as well. We can do that in the same PR or a separate one, but it should come first. I think adding that support would be easy, as we just need to use a different prefix.

mlissner commented 1 month ago

Same PR or separate for doing the v4 work seems fine to me.

If we don't have to, I don't want to keep the list of blocked v3 users, BUT I think we do have to, because we're only going to check one out of 50 requests, right, so what I was thinking was:

Request # Action
1 Allow it
2-49 Allow it
50 Problem. Block it, save the user as blocked.
51-∞ Block that user!

If you don't have the list of blocked users, can you block all of their subsequent requests?

albertisfu commented 1 month ago

Got it. Yeah, that's right. We’ll need a list of blocked users so we can block any future requests once we detect a user should be blocked.

Sounds like we have a plan. Let me know if we should start working on this.