bnjmnrsh-projs / signal-v-noise

Code kata using the NYT API.
https://bnjmnrsh-projs.github.io/signal-v-noise/
1 stars 0 forks source link

API: 429 too many requests #26

Open bnjmnrsh opened 8 months ago

bnjmnrsh commented 8 months ago

The problem

It's pretty easy to hit the API too often when first exploring the app. Of course, once you visit a route, the local store object will be served to you as a cache, which is nice and fast. However, if the Cloud Flare cache headers are stale, once you look at just a few routes, we start hitting 429 errors.

One possible solution:

Short of paying for more access from the NYT

Create a CF Worker chron to cash all the routes cyclically, so there is always something 'hot' in its cache. The trick here is to spread out which endpoint is being requested at any given request and march through them one at a time.

Practically speaking, each time a CF worker runs, it would update the key for the last run call in a KV store, and each time it runs, increment through the array.

I think I would like to run this as a separate worker, but I am unsure if workers can share cache objects. For example, if the chron job worker can cache results our current worker can then return. If not, we could add another URL param flag to our current worker to trigger the chron. It's already quite long, however.

Timeframe comparison

If done sequentially, say every 30 minutes, it would take 14.5 hours for a CF Cron worker to get through all 29 routes (a bit long). If we did it every five min, it would take 2.41 hours to refresh the cache. To do the whole thing in under an hour, we will have to hit the API with a route every two minutes.

NYT API Limits:

500 requests per day / five requests per minute / 12-second delay between requests (or an average of 20.83 requests an hour) https://developer.nytimes.com/faq#a11

So, to stay within limits while tracking 29 endpoints:

500 / 29 endpoints = 17.24 (max times a route can be called daily)
1,440min / 500 = 2.88min (a day divided by the max number of calls in a day)

To be a touch conservative of the API use, a call every 3 minutes seems reasonable.

Cloudflare limits (possible fly in the ointment)

On the free plan, you are allowed up to 5 chron triggers. I believe a trigger is a scheduler, not the number of times a chron job runs. So, one schedule that sets up a chron to run every 3 minutes appears to be within the usage of the free tier.

https://developers.cloudflare.com/workers/platform/limits/

Fallback 'chron' strategy

If im misunderstanding the usage limits for CF chron, then we could look to replicate the behavior with something like UptimeRobot, which can ping an endpoint as often as every 5 minutes on the free tier. It could it be possible to set multiple 'watchers', to do it more often, but we would have to add more checks to the worker to check how log it had been since a an API endpoint was pinged, so that we dont accidentally overload our usage.