Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
6.12k stars 4.17k forks source link

feature: automatic number of document as expert setting #2066

Open cforce opened 1 week ago

cforce commented 1 week ago

Instead of setting a fixed number of documents to be injected into the prompt, dynamically calculate this based on the user's configuration of "Max Length of a System Response" in the expert settings. Allow users to set the document count to "auto" and prompt them to configure the "Max Length of a System Response," with a default value provided.

The number of documents that can be injected into the prompt should be based on the formula:

`

Max Response Tokens = #Prompt Tokens + #User Message Tokens + #Document Injected Tokens + #Response Message Tokens

`

Given:

variables:

The process should iterate over the ranked and ordered document list, adding complete documents (or pages) one by one to the prompt until the condition #Max Response Tokens <= 0 is met.

pamelafox commented 2 days ago

I've run evaluations on pulling in more documents for the RAG flow, and the results are often not better, due to the increase in irrelevant documents.

Therefore, I think a setting like this should only be used in conjunction with a minimum semantic ranker score threshold, as otherwise you can easily end up sending too many irrelevant documents to the LLM.

Given that, I do think an option like this makes sense, especially given the increasing size of context windows and people's desire to ask questions across many documents.