Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.6k stars 3.75k forks source link

Address bot mistakes/incorrect information #1751

Open bp3000bp opened 1 week ago

bp3000bp commented 1 week ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Bot gets wrong answers literally all the time. Trying to find a way to log these and correct mistakes it often makes.

Any log messages given by the failure

n/a

Expected/desired behavior

I would like to request a feature, or just ask if there's an easy way to do this. I need this bot to provide correct information to my company. It often gets answers wrong. When it gets an answer wrong, I'd like for the bot to learn from its mistakes and get smarter over time. I was thinking maybe creating a log of frequent incorrect answers and feeding that to the bot in its initial query, or in the data folder? I have not had much success with this so far.

Another way I could see this being solved would be having a place to input keywords and the destination of the correct information. So, if it often references the wrong doc when I ask about "vacation policy" for example, if I could manually map to where or what the correct answer is, that would be another way to solve incorrect responses.

The success of this bot depends on the reliability of the it to provide correct info. I believe it needs a way to correct itself and learn from its mistakes, it is very frustrating when it continues to be wrong and there is no way to correct it. Any thoughts? Need this bot to be reliable for my company, and currently it is missing the mark. I have already viewed all the documentation on improving answer quality.

OS and Version?

Windows 11 Pro

azd version?

azd version 1.9.3 (commit e1624330dcc7dde440ecc1eda06aac40e68aa0a3)

Versions

n/a

Mention any other details that might be useful


Thanks! We'll be in touch soon.

pamelafox commented 1 week ago

I see that you read the doc on improving answer quality already. Have you identified if the incorrect answers are due to the retrieval step or the LLM step?

If they are due to the retrieval step, then you may want to try out different chunking strategies or try a content chunking strategy like concatenating the document title to each chunk. You can also try increasing the retrieval count and/or setting a minimum reranker score. Have you tried running evaluations with https://github.com/Azure-Samples/ai-rag-chat-evaluator to see if any parameters increase answer quality for you?

You can also try changing to a more powerful model like gpt-4. That won't help with the retrieval step, but that could help if the problem is that the model isn't distinguishing well between helpful results and irrelevant results in the sources.

pamelafox commented 1 week ago

As for whether you can teach the bot: I would first start off with taking those examples that it does poorly on, and add them to an evaluation data set. Then you can run evaluations with different parameters and see whether you can improve results. There's only so much context that you can fit into the prompt, so I'm not sure it'd work to include stuff in the system prompt like "vacation policy is in doc X". Perhaps it would if your number of documents is small enough. The way to actually teach these models is fine-tuning, which you can certainly try, but the expense may not be worth it. There's a technique called RAFT from Berkeley which is designed for improving results for RAG applications. The retrieval still needs to be good, however.

bp3000bp commented 6 days ago

@pamelafox It's with retrieval. I believe it's because there are some documents with overlapping content. For example, another one that the bot often gets wrong is confusing the "Company credit card" doc with the "Business card" doc.

How can I concatenate the document title to each chunk? Is there documentation on this? Or how else could I address the bot getting confused in retrieval with overlapping content? Already added metadata to these PDFs but that isn't seeming to help.

I don't know how complicated this would be (I'm just a novice IT intern college student), but ideally, I'd be able to tell the bot if the user says certain keywords like "company card" don't even bother going to the semantic search ranker thing, just skip that step, I'll tell you the exact file to reference and then you can take that info, send to LLM and complete process. This might be something I have to figure out on my own, but please let me know if you think there's a way that could work.

Thank you for the responses and helpful information, it is much appreciated.