deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.98k stars 1.86k forks source link

RAG tutorial 7, how to detect wrong answer? #658

Closed demarant closed 3 years ago

demarant commented 3 years ago

Question I just run the latest Tutorial 7 - Generative QA via "Retrieval-Augmented Generation": on Colab.

It looks promising.

I just realised though that from the examples, the model would still return a wrong answer if the question is not covered in our docuement store.

For example, we get: Generated answer is ' india' for the question = 'panda is a national animal of which country'

We know that the answer should be China, not India. It looks to me as a "reverse allucination" of the model. There is nothing about the panda animal in the document store.

How can we detect that the questions are not covered by our document store? Do we have any probabilities we can use as a threshold, similar to what is provided by the extractive QA model when we get the answers? Shall we use the doc_scores as an indication, like any doc_scores that is below 80?

Is there planning to add a "No answer" as a valid answer?

Thanks

lalitpagaria commented 3 years ago

I might be wrong due to limited knowledge, @tholor please correct me.

How can we detect that the questions are not covered by our document store? Do we have any probabilities we can use as a threshold, similar to what is provided by the extractive QA model when we get the answers? Shall we use the doc_scores as an indication, like any doc_scores that is below 80?

You can use retriever returned documents for this purpose. Generator will always try to generate answer from retriever provided documents. Hence you can use doc_scores returned by retriever and filter if it is below certain threshold before feeding it to generator.

Is there planning to add a "No answer" as a valid answer?

I think for this also same would be valid. If retriever does not return any docs then generator will not produce any answer.

As this is Retriever-Augmented Generator hence Retriever plays big role here. Fine tuning of question and generator encoder will produce good result as stated in original paper. But I feel by tweaking Retriever retrieved documents also one can achieve good result.

Again I might be wrong and I answered based on my understanding of RAG when I was integrating it in the Haystack.

tholor commented 3 years ago

Hey @demarant ,

Very relevant question! I think it would be quite a helpful feature for the practical usage of RAG. However, I don't see a quick solution for this. I believe the original paper / implementation does not cover a "no answer" option.

@lalitpagaria From my perspective, using the retriever scores is not really a good proxy here. The scores from DPR are rather reflecting a "higher-level" match of doc and query and will probably be to "blurry" for determining a "no answer". Secondly, RAG can still give correct answers even though the answer is not contained in the retrieved docs (e.g. when the answer is "remembered" from previous training).

I could see two broad directions that might be worth exploring:

1) Using the likelihood of the generated sequence (e.g. from beam search) or use a measure like perplexity

2) Training a separate, lightweight prediction head that determines the confidence level of the generated sequence

My gut feeling that 2) would give better calibrated scores and allows setting better thresholds to return no answer. We will work on model confidence (mostly for reader models) in Q1/Q2 next year. We can try to incorporate RAG there as well. If you want to explore this topic yourself in the meantime, we will try to give some light-weight support of course.

tholor commented 3 years ago

@Timoeller @julian-risch please give this some quick thoughts when working on #739. If it cannot be tackled together with reader confidence, it would be great to at least plan a rough concept for generative models.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 21 days if no further activity occurs.