Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
5.57k stars 3.74k forks source link

Too much information on responses #1685

Open bp3000bp opened 3 weeks ago

bp3000bp commented 3 weeks ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [x] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Ask a question to a chatbot with a .txt or .pdf file full of several common user FAQs and the answer the bot should give to those questions.

Any log messages given by the failure

N/a

Expected/desired behavior

I am hoping to find a way for the bot to not give information about printers when I'm asking about wifi. Sometimes it will give the correct answer to the original question but then ramble on about unrelated stuff in the doc it's referencing. Would creating separate .txt or .pdf files for each FAQ make it less likely for the bot to volunteer unnecessary information? Or is there direction I can give it to prevent this?

OS and Version?

Windows 11

azd version?

azd version 1.9.3 (commit e1624330dcc7dde440ecc1eda06aac40e68aa0a3)

Versions

Mention any other details that might be useful


Thanks! We'll be in touch soon.

pamelafox commented 2 weeks ago

I have seen high variability in answer length based off different prompt tweaks. I would suggest starting off with tweaking the prompt, and ideally running evaluations across a number of questions to see the answer length, using the evaluator tools: https://github.com/Azure-Samples/ai-rag-chat-evaluator

From my experiments, our baseline prompt usually results in relatively short answers, but you could try giving more directive about how long the response should be.

Another parameter you can experiment with is the semantic ranker score threshold - perhaps it's returning documents that aren't relative at all, and have a semantic score less than 2 or 1.5. You could then set the threshold to filter out those results. Once again, you'd want to evaluate across multiple questions to ensure no degradation in answer quality on other questions.