deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.67k stars 1.91k forks source link

Question about generalization of QA models to out of domain data #744

Closed SasikiranJ closed 3 years ago

SasikiranJ commented 3 years ago

Question Hi, I am little bit unclear about how reader producing answer from given documents? I am using a model which was trained on squad2 dataset. I want to know how model is producing answer from custom documents for different question without even being trained on my data? I am just passing my own documents and passing question to the model. Please let me know how transfer learning helping me here. Thank you in advance.

Additional context Add any other context or screenshots about the question (optional).

Timoeller commented 3 years ago

Hey @SasikiranJ that is indeed a good question (I reformulated the issue title accordingly). The ability of Question Answering Models to generalize to out of domain data (your custom documents) is an ongoing research topic. For an up to date reference please have a look at the EMNLP 20202 paper

To give you some condensed guidance:

How good the model generalizes to your custom documents, you best annotate a couple of question answer pairs and see how the model performs.

Timoeller commented 3 years ago

Question seems answered, closing now. Feel free to reopen if there are more questions coming up.