Azure-Samples / chat-with-your-data-solution-accelerator

A Solution Accelerator for the RAG pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences. This includes most common requirements and best practices.
https://azure.microsoft.com/products/search
MIT License
639 stars 313 forks source link

Index error when user queries are complex or in quick session #114

Closed arun13go closed 4 months ago

arun13go commented 6 months ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

After successfully uploaded the large document User asking queries from those docs Sequence of queries from the document one by one or ask ask complex queries UI thro UI error "An error occurred. Please try again. If the problem persist, please contact the site administrator" seems index out of range error in the OutputParserTool.py so look like issue in conversation chain cache

Any log messages given by the failure

see attached (redacted)

Expected/desired behavior

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) Microsoft Windows 10 Enterprise Version 10.0.19044 Build 19044

Versions

Mention any other details that might be useful

redacted_error_userquery.log


Thanks! We'll be in touch soon.

arun13go commented 5 months ago

Hey Team, Is there update as customer is waiting for a fix. Can you please prioritise this bug if possible.

adamdougal commented 5 months ago

I'm still working to figure out why this is happening. But this is being caused because the call to open ai here:

https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/blob/main/code/utilities/orchestrator/OpenAIFunctions.py#L78

Is returning a stop finish reason. That causes the code to enter the else block, which does not populate the source_documents data:

https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/blob/main/code/utilities/orchestrator/OpenAIFunctions.py#L104

However, in the returned message, it still includes a reference to a document e.g. .... [doc1]. We then look up source document 1 from the empty list and a IndexError is thrown.

adamdougal commented 5 months ago

I think this is caused by the previous citation references e.g. .... [doc1] being included in the previous messages we send to the chatgpt model here:

https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/blob/main/code/utilities/orchestrator/OpenAIFunctions.py#L72-L76

The journey through the code it's taking, does not interact with AzureAI Search or our blob storage, therefore the only information it has reference to is in the model itself and previous messages. I think it is then "accidentally" including previous references in it's response.

I believe I have proven this theory by re-running an example that exercised this bug, with and without past messages included in the chat completion. Without the past messages, this error does not occur.

To fix this I think there are a few options:

  1. Ignore any [docN] references if there are no source documents
  2. Remove any [docN] references from past messages
  3. Tell the model not to only use previous responses
adamdougal commented 5 months ago

I've tried various different system messages but have been unable to force the chat to always call the function.

That leaves solutions 1 or 2, which are very similar.

ross-p-smith commented 5 months ago

@ruoccofabrizio - do you have a view whether Option 1 or 2?

adamdougal commented 5 months ago

I have raised a change to fix the issue described above https://github.com/Azure-Samples/chat-with-your-data-solution-accelerator/pull/193

In my testing I have also noticed that it is also possible to get a IndexError in a different location:

ERROR:root:Exception in /api/conversation/custom | list index out of range
Traceback (most recent call last):
  File "/workspaces/chat-with-your-data-solution-accelerator/code/app/app.py", line 283, in conversation_custom
    chat_history.append((user_assistant_messages[i]['content'],user_assistant_messages[i+1]['content']))

This happens after there has been another error, and no response has been returned from the assistant. Our code however expects there to always be an assistant message after a user message, i.e. 1 message back and forth.

This will also need fixing.

adamdougal commented 4 months ago

Both issues should now be fixed.