GAM-team / got-your-back

Got Your Back (GYB) is a command line tool for backing up your Gmail messages to your computer using Gmail's API over HTTPS.
https://github.com/GAM-team/got-your-back/wiki
Apache License 2.0
2.61k stars 206 forks source link

Adds rate limiting to comply with Google's API rate limits #308

Closed mhrivnak closed 3 years ago

mhrivnak commented 3 years ago

Pulls tokens from a simple token bucket prior to making an API call or adding a specific call to a batch. Google documents its rate limits as a moving average:

https://developers.google.com/gmail/api/reference/quota https://developers.google.com/admin-sdk/groups-migration/v1/reference/archive/insert

I used this change successfully to backup and restore an account with 26,000 messages. Without this change it had been hitting the rate limit.

fixes #287

jay0lee commented 3 years ago

This is great but I'd like to see the quota checks put into callGAPI() function so we don't have to remember to add them everywhere.

mhrivnak commented 3 years ago

This is great but I'd like to see the quota checks put into callGAPI() function so we don't have to remember to add them everywhere.

Good suggestion. Done.

  • we need to also handle API calls where we don't specifically have a rate limit set for them, they should just execute immediately.

Every call the code makes right now has a specific quota value, and they've been collected in the GOOGLEQUOTAS dict. If additional methods are added to this project in the future, here are two ways to handle it:

  1. allow such requests by default without any rate limiting. The developer must remember to add a quota value for that method, or wait until the google API starts returning rate limit errors (which may not happen until a user files a bug report). I think it'll be easy to forget about the quota aspect.
  2. exit with an error upon requesting quota for an unknown method. This is how it's currently implemented. The developer will immediately recognize the issue and can add a corresponding quota value to the dictionary.

I think option 2 is most robust and will ensure calls don't leak in over time without quota support. But I'm happy to change it to option 1 if you'd prefer that behavior.

jay0lee commented 3 years ago

I think 2 makes the most sense. There are apu calls like drive.about.get that GYB uses today and Drive definitely has quotas but it's a single call per GYB run and highly unlikely to hit Drive quota limits.

In other words let's not worry about a given quota being a problem until it's a problem. Groups migration and Gmail definitely have issues but Drive and the GCP apis are fine. So maybe:

Jay

Jay

On Sat, Mar 20, 2021, 12:41 PM Michael Hrivnak @.***> wrote:

This is great but I'd like to see the quota checks put into callGAPI() function so we don't have to remember to add them everywhere.

Good suggestion. Done.

  • we need to also handle API calls where we don't specifically have a rate limit set for them, they should just execute immediately.

Every call the code makes right now has a specific quota value, and they've been collected in the GOOGLEQUOTAS dict. If additional methods are added to this project in the future, here are two ways to handle it:

  1. allow such requests by default without any rate limiting. The developer must remember to add a quota value for that method, or wait until the google API starts returning rate limit errors (which may not happen until a user files a bug report). I think it'll be easy to forget about the quota aspect.
  2. exit with an error upon requesting quota for an unknown method. This is how it's currently implemented. The developer will immediately recognize the issue and can add a corresponding quota value to the dictionary.

I think option 2 is most robust and will ensure calls don't leak in over time without quota support. But I'm happy to change it to option 1 if you'd prefer that behavior.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jay0lee/got-your-back/pull/308#issuecomment-803401645, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIZMCFPD3HFLSTPLYPITTTETF33ANCNFSM4ZFYXQXA .

mhrivnak commented 3 years ago

Ah, you point out that drive is currently not handled. I'll fix that.

mhrivnak commented 3 years ago

I added explicit non-rate-limiting for the drive API.

Again my own take is that if we have a default value for unrecognized methods, whether it's default to no limiting or default to highly-conservative limiting, there's a good chance the developer won't notice. As it's implemented right now, if someone adds a new method, they'll get an error the first time they try to run it, and it's real obvious where to find the correct quota value and add it as code. (the URL to the quota document is in the doc string for the quota structure even)

That said, if you think the default value is the better approach, I'll happily add that.

jay0lee commented 3 years ago

I think we can try that approach. However GYB also calls a few Cloud APIs to create the project, service account, enable apis, etc. Are those added?

On Sun, Mar 21, 2021, 12:50 PM Michael Hrivnak @.***> wrote:

I added explicit non-rate-limiting for the drive API.

Again my own take is that if we have a default value for unrecognized methods, whether it's default to no limiting or default to highly-conservative limiting, there's a good chance the developer won't notice. As it's implemented right now, if someone adds a new method, they'll get an error the first time they try to run it, and it's real obvious where to find the correct quota value and add it as code. (the URL to the quota document is in the doc string for the quota structure even)

That said, if you think the default value is the better approach, I'll happily add that.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/jay0lee/got-your-back/pull/308#issuecomment-803620638, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABDIZMFHVBJPNTIXYMHTFH3TEYPW7ANCNFSM4ZFYXQXA .

mhrivnak commented 3 years ago

You've talked me into the middle-ground. :) There are enough other API groups in use that don't have rate limiting documented by Google, that I think it makes sense to only apply our rate-limiting to the gmail and groupsmigration API groups. Within those groups, all calls must have a defined method and quota value. All other groups will not be limited on our end. The implementation being that if a QuotaBucket doesn't exist for an API group, no limiting is applied.

How's that sound to you? I pushed a change just now that implements it as described here.

jay0lee commented 3 years ago

Looks great! Many thanks for the PR.