Open cforce opened 1 week ago
I've run evaluations on pulling in more documents for the RAG flow, and the results are often not better, due to the increase in irrelevant documents.
Therefore, I think a setting like this should only be used in conjunction with a minimum semantic ranker score threshold, as otherwise you can easily end up sending too many irrelevant documents to the LLM.
Given that, I do think an option like this makes sense, especially given the increasing size of context windows and people's desire to ask questions across many documents.
Instead of setting a fixed number of documents to be injected into the prompt, dynamically calculate this based on the user's configuration of "Max Length of a System Response" in the expert settings. Allow users to set the document count to "auto" and prompt them to configure the "Max Length of a System Response," with a default value provided.
The number of documents that can be injected into the prompt should be based on the formula:
`
Max Response Tokens = #Prompt Tokens + #User Message Tokens + #Document Injected Tokens + #Response Message Tokens
`
Given:
Response Message Tokens
Max Response Tokens
Prompt Tokens
variables:
User Message Tokens
The process should iterate over the ranked and ordered document list, adding complete documents (or pages) one by one to the prompt until the condition
#Max Response Tokens <= 0
is met.