Open bp3000bp opened 1 week ago
I see that you read the doc on improving answer quality already. Have you identified if the incorrect answers are due to the retrieval step or the LLM step?
If they are due to the retrieval step, then you may want to try out different chunking strategies or try a content chunking strategy like concatenating the document title to each chunk. You can also try increasing the retrieval count and/or setting a minimum reranker score. Have you tried running evaluations with https://github.com/Azure-Samples/ai-rag-chat-evaluator to see if any parameters increase answer quality for you?
You can also try changing to a more powerful model like gpt-4. That won't help with the retrieval step, but that could help if the problem is that the model isn't distinguishing well between helpful results and irrelevant results in the sources.
As for whether you can teach the bot: I would first start off with taking those examples that it does poorly on, and add them to an evaluation data set. Then you can run evaluations with different parameters and see whether you can improve results. There's only so much context that you can fit into the prompt, so I'm not sure it'd work to include stuff in the system prompt like "vacation policy is in doc X". Perhaps it would if your number of documents is small enough. The way to actually teach these models is fine-tuning, which you can certainly try, but the expense may not be worth it. There's a technique called RAFT from Berkeley which is designed for improving results for RAG applications. The retrieval still needs to be good, however.
@pamelafox It's with retrieval. I believe it's because there are some documents with overlapping content. For example, another one that the bot often gets wrong is confusing the "Company credit card" doc with the "Business card" doc.
How can I concatenate the document title to each chunk? Is there documentation on this? Or how else could I address the bot getting confused in retrieval with overlapping content? Already added metadata to these PDFs but that isn't seeming to help.
I don't know how complicated this would be (I'm just a novice IT intern college student), but ideally, I'd be able to tell the bot if the user says certain keywords like "company card" don't even bother going to the semantic search ranker thing, just skip that step, I'll tell you the exact file to reference and then you can take that info, send to LLM and complete process. This might be something I have to figure out on my own, but please let me know if you think there's a way that could work.
Thank you for the responses and helpful information, it is much appreciated.
This issue is for a: (mark with an
x
)Minimal steps to reproduce
Any log messages given by the failure
Expected/desired behavior
Another way I could see this being solved would be having a place to input keywords and the destination of the correct information. So, if it often references the wrong doc when I ask about "vacation policy" for example, if I could manually map to where or what the correct answer is, that would be another way to solve incorrect responses.
The success of this bot depends on the reliability of the it to provide correct info. I believe it needs a way to correct itself and learn from its mistakes, it is very frustrating when it continues to be wrong and there is no way to correct it. Any thoughts? Need this bot to be reliable for my company, and currently it is missing the mark. I have already viewed all the documentation on improving answer quality.
OS and Version?
azd version?
Versions
Mention any other details that might be useful