Azure-Samples / azure-search-openai-demo

A sample app for the Retrieval-Augmented Generation pattern running in Azure, using Azure AI Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
https://azure.microsoft.com/products/search
MIT License
6.15k stars 4.18k forks source link

Wrong Citation #159

Open realdevmann opened 1 year ago

realdevmann commented 1 year ago

Please provide us with the following information:

This issue is for a: (mark with an x)

- [ ] bug report -> please search issues before submitting
- [o ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)

Minimal steps to reproduce

Any log messages given by the failure

Expected/desired behavior

Answer in according to uploaded files

OS and Version?

Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?) Window11

Versions

Mention any other details that might be useful

I upload my data to Azure Storage. I can run Azure Chat App correctly. But, I am disappointed with wrong answer of chatbot. My data file is written by Korean and English. My chatbot's answer is not consistent whenever I ask. Sometimes chatbot answer correctly, sometimes not.

[ Open Source Modification ]

  1. create_search_index() in predocs.py
    • before : SearchableField(name="content", type="Edm.String", analyzer_name="en_microsoft"),
  1. query_prompt_template in chatretrieveread.py
    • before : If the question is not in English, translate the question to English before generating the search query.
    • After : Please search in the language of the original input of the question, never try to translate it into English.

[ Chatbot provided wrong citation ]

image

[ Azure Storage Explorer ] SACC is defined in different file(*Lecture-7.pdf) image

How Can I improve Azure Search Service correctness?


Thanks! We'll be in touch soon.

ericthomas1 commented 1 year ago

I'm seeing the same issue using the sample data.

image

image

thiago-acn commented 1 year ago

I also have the same issue. Have anyone try to fix the problem?

ericthomas1 commented 1 year ago

I was unable to fix the issue. It used to work for me, in past deployments, but in the latest deployment, this issue popped up.

dbae1145 commented 1 year ago

@ericthomas1 which of the previous deployments?

ericthomas1 commented 1 year ago

@dbae1145 I've deployed the example multiple times in various Azure env's. In my previous, early deployments there were no issues. In my later deployments, this issue started popping up.

I'd say ~4-5 weeks ago.

villepuntanen commented 1 year ago

Similar issue reported in my case

itmilos commented 1 year ago

We should work on predocs functionally...

kruselegal commented 1 year ago

I think it's an off by one error in the pdf splitting. The pages index beginning at page 0, not page 1.

ArunEPRO commented 1 year ago

I am facing a similar issue. The previous file is always shown as citation.

nickroseth commented 1 year ago

Getting same thing. Any updates on this?

jomieljaniuk commented 1 year ago

Some cases can be caused because of the bug I described in #370 issue

nickroseth commented 1 year ago

Any update on this? is #370 the proper way to address? If so, is it because of how the chunking process works?

WVSdigitaldojo commented 1 year ago

Still a one off issue atm. Can we expect a fix?

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this issue will be closed.

ericthomas1 commented 10 months ago

Still waiting for a fix! The issue of incorrect references still exists. Would be nice to know how to address, whether its with some sort of indexing or a Cognitive Search setting, etc.

oizidbih commented 8 months ago

Hey there, I'm facing the same issue, and it is causing a serious problem with our customers where the bot started showing wrong citations, are there any solutions for this?

pamelafox commented 8 months ago

@oizidbih When the wrong citations happen, have you dug into the thought process to see whether the search results have the correct filenames? Are the citations entirely made up, or syntactically incorrect? My general recommendation is to run an evaluation and change parameters to improve results. This repo is designed for evaluating this chat app: https://github.com/Azure-Samples/ai-rag-chat-evaluator

oizidbih commented 8 months ago

@pamelafox Thanks for your response. I have a website, a few adequately named documents, and an FAQ document as references. The citations are made up and happen more frequently now, and I'm still trying to figure out why. Interestingly, the answers are mostly correct, but why is the reference wrong? I also considered deactivating it for now, but I still need to figure out how. I'll try the evaluation and let you know