OpenAI embedding token limit - reprocess all documents

federicocaccialanzaabb commented 10 months ago

Please provide us with the following information:

This issue is for a: (mark with an `x`)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Take a pdf sample file of 10 MB ( the one I used is this one https://freetestdata.com/wp-content/uploads/2022/11/Free_Test_Data_10.5MB_PDF.pdf) and upload it :

This will results in a success for the azure function batch push results (no token limit exceeded)

upload at the same time more than one copy of the file in order that them combined surpass the token limitation. Let the ingestion of data process and an error will be displayed in the Batch Push Results.

(the same error can be achieved if there are several documents in your azure blob storage that when uploaded and ingested individually were okay but combined they surpass your token limitation. If, through the admin page, you click on reprocess all documents, you will encounter the same error (in fact the same azure function is called, but with the parameter to take all files in the blob storage equal to true and not only the one without the embedding).)

Any log messages given by the failure

Result: Failure Exception: RateLimitError: Requests to the Embeddings_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 3 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.

Expected/desired behavior

What I would like the app to do is: when reached the token limit (or when it estimates that the next file or chunk is gonna surpass the limit) to wait that the minute finishes and then continue afterwards (if it is not possible at chunk level at least at file level), in this way it will not fail due to token per minute limitations.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful

Thanks! We'll be in touch soon.

deuch commented 6 months ago

Hello, you can modify the method :

https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/blob/f136f4fce79a70eaaff51e6617083648e9eea0c7/code/backend/batch/utilities/helpers/LLMHelper.py#L82

And add : max_retries=20 options for each AzureOpenAIEmbeddings constructor in the if and else statement.

It will use an exponential retry if you encounter HTTP 429 errors (related to call rate per minute)

@hongrunw Thx for the tips 👍

federicocaccialanzaabb commented 6 months ago

Thank you @deuch, I will modify the method

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 30 days.

Azure-Samples / chat-with-your-data-solution-accelerator