Deal with "429 Resource has been exhausted (e.g. check quota)"

sebnapi commented 1 week ago

We might want to have a mechanism to handle articles which summaries get rejected by the LLM due to quota.

finaldie commented 1 week ago

I guess you might be reaching the google Gemini free tier's quota?

I'm not sure what kind of result you want to see. Could you expand on it a bit?

If you mean after one LLM reaches the quota, we want a mechanism to continue the summary process:

Currently, we will fallback to OpenAI if the main LLM_PROVIDER uses others. While this one requires to configure the OpenAI API key
Or, we can consider using Ollama hosts an open-source model and configuring Ollama as the LM provider so there won't be a quota issue.

Thoughts?

sebnapi commented 1 week ago

Yes, exactly. I just wanted to keep this in mind with an issue ticket. At the moment it will result in an empty summary.

I think ideally we should have some mechanism that would wait and retry at another time.

Some services will return 429 when to many requests are hitting their service, the limit could be 200 concurrent requests for example in case of deepinfra.com

With free-tier gemini it is:

15 Request per minute 1 Million Tokens per minute 1.500 Requests per day.

Solution A) Maybe we could have some RateLimiting Class that all requests get routed through and is configurable in the settings. I'm just thinking out loud. The value gained to workload ratio isn't the best tbh 😁.

Solution B) We just queue up everything that resulted in a 429 and schedule a new airflow run, with a limit of X retries.

finaldie commented 1 week ago

:+1: great ideas.

For solution A, currently, we don't have such a global rate limiter; from experience, Reddit most likely generates most of the posts, so as a workaround, tuning this config REDDIT_PULLING_COUNT=xxx to a lower number could have less pressure on the LLM quota.

For solution B: Nice one; we could send it to the corresponding inbox again if it reaches 429. The issue I can think of for now, assume the daily workload is roughly the same, if day-1 reaches a lot of 429, and we push them to the queue again, then we have to wait the next day to get a new quota to process (next day), over and over, the tasks will be highly delayed, and the system could be overloaded with more and more queued tasks. (like snow crash)

Based on your thoughts, I'd feel it would be nicer if we combine solutions A and B together:

Apply rate limiter to maintain the average daily load (drop some and make sure it is within the LLM daily quota limit)
Queue tasks and retry later for the small amount of failed tasks

At this moment, tuning REDDIT_PULLING_COUNT=xxx would be the quickest solution to reduce the load of the system; the default value is 25; for example, reduce it to 3 for a try until the system can finish all the tasks without triggering Gemini quota limit.

finaldie / auto-news

Deal with "429 Resource has been exhausted (e.g. check quota)" #104