Open bdb-dd opened 1 year ago
First end to end test completed. Initial results look very promising.
Will likely require additional testing and content improvements to deal with issues related to certain topics.
Certain documents should probably be included in context regardless of search terms.
Sent invitation to a broader group of people who can contribute with a varied set of user queries. Quickly finding examples where the first stage, extract search terms, is not as selective as it could be. A large number of search terms currently results in a smaller result set, sometimes including documents that are highly ranked for no apparent reason.
Have also tested asking GPT 3.5 for feedback on which of the supplied context documents were relevant, with good results. So one option would be to "pin" certain source documents, such that they are always included in the RAG context. The context length has varied significantly from query to query, sometimes exceeding 16K which is our current upper limit.
Description
Updated 26.02.24: Even after having successfully addressed the retrieval ranking issues we had earlier, there are still many opportunities for improving retrieval for specific kinds of queries. As an example, a user may wish to qualify their search by defining specific filters, such as "updated recently", "sorted by version number" or "only open issues".
The "Query understand" strategy calls for using LLMs to generate the retrieval query itself, based on a combination of knowledge of the underlying search engine, the data schemas involved and potentially some function calling extensions.
Additional Information
No response