Azure-Samples / chat-with-your-data-solution-accelerator

A Solution Accelerator for the RAG pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. This includes most common requirements and best practices.
https://azure.microsoft.com/products/search
MIT License
803 stars 406 forks source link

OpenAI embedding token limit - reprocess all documents #111

Open federicocaccialanzaabb opened 10 months ago

federicocaccialanzaabb commented 10 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Take a pdf sample file of 10 MB ( the one I used is this one https://freetestdata.com/wp-content/uploads/2022/11/Free_Test_Data_10.5MB_PDF.pdf) and upload it :

image

This will results in a success for the azure function batch push results (no token limit exceeded)

image

upload at the same time more than one copy of the file in order that them combined surpass the token limitation. Let the ingestion of data process and an error will be displayed in the Batch Push Results.

image image

(the same error can be achieved if there are several documents in your azure blob storage that when uploaded and ingested individually were okay but combined they surpass your token limitation. If, through the admin page, you click on reprocess all documents, you will encounter the same error (in fact the same azure function is called, but with the parameter to take all files in the blob storage equal to true and not only the one without the embedding).)

Any log messages given by the failure

Result: Failure Exception: RateLimitError: Requests to the Embeddings_Create Operation under Azure OpenAI API version 2023-07-01-preview have exceeded call rate limit of your current OpenAI S0 pricing tier. Please retry after 3 seconds. Please go here: https://aka.ms/oai/quotaincrease if you would like to further increase the default rate limit.

Expected/desired behavior

What I would like the app to do is: when reached the token limit (or when it estimates that the next file or chunk is gonna surpass the limit) to wait that the minute finishes and then continue afterwards (if it is not possible at chunk level at least at file level), in this way it will not fail due to token per minute limitations.

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

deuch commented 6 months ago

Hello, you can modify the method :

https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/blob/f136f4fce79a70eaaff51e6617083648e9eea0c7/code/backend/batch/utilities/helpers/LLMHelper.py#L82

And add : max_retries=20 options for each AzureOpenAIEmbeddings constructor in the if and else statement.

It will use an exponential retry if you encounter HTTP 429 errors (related to call rate per minute)

@hongrunw Thx for the tips 👍

federicocaccialanzaabb commented 6 months ago

Thank you @deuch, I will modify the method

github-actions[bot] commented 3 weeks ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 30 days.