How to configure and avoid Open AI rate limiting when ingesting files?

SciPhi-AI / R2R

The most advanced Retrieval-Augmented Generation (RAG) system, containerized and RESTful

https://r2r-docs.sciphi.ai/

MIT License

3.64k stars 270 forks source link

How to configure and avoid Open AI rate limiting when ingesting files? #1601

Open emahpour opened 4 hours ago

emahpour commented 4 hours ago

Describe the problem I uploaded a json file with around 4000 entries inside. While I was monitoring the processes, I realized Open AI is enforcing rate limiting and the application was not responsive as it was keep retrying the failed calls to Open AI. What is the recommendation to avoid running into this problem?

To Reproduce Create a json file with large number of entries (eg 4000 rows).

Upload the file as a document
Initiate Graph Creation Process

Expected behavior Processing the Graph generation with consideration of API rate limits with Open AI

Screenshots

NolanTrem commented 4 hours ago

Edit: I didn't realize this was a graph process. If you're using the full version and a single job fails, you can retry that job look for the orchestration cookbook in the docs.

Given that this is a JSON file, it might make sense for you to upload entries as chunks rather than a single document. The embedding requests are sent in batches with exponential back off, though, so I suspect that eventually this will succeed. If you're using the full version, and it fails, you can always retry the job which is especially helpful when you've broken the file up or have many files.

emahpour commented 4 hours ago

Even using Hatchet with smaller chunks it can technically run into same rate limit issue, no? Is there any configuration to apply this rate limiting in hatchet queues?

NolanTrem commented 4 hours ago

I think what you're looking for then is the batch_size parameter in the configuration file. The default is 256. Changing this would only impact future graphs, though.

emahpour commented 2 hours ago

Unfortunately lowering batch_size did not help and instead it overloaded the container on cpu with tons of attempts to retry and fail. There should be a better way rather than brute forcing and hoping it eventually processes everything.