Document Retrieval for extractive QA with COVID-QA

aaronbriel commented 3 years ago

Thank you so much for sharing your data and tools! I am working with the question-answering dataset for an experiment of my own.

@Timoeller mentioned in #103 that the documents used in the annotation tool to create the COVID-QA.json dataset "are a subset of CORD-19 papers that annotators deemed related to Covid." I was wondering if these are the same documents as listed in faq_covidbert.csv.

The reason I ask is that, as a workaround I've created my own retrieval txt file(s) through extracting the answers from COVID-QA.json, but the results are hit or miss. They are particularly off if I break the file up into chunks to improve performance, for instance into a separate txt file for each answer. I'm assuming this is due to lost context. I'm wondering if I should simply be using faq_covidbert as illustrated here, even though I am using extractive-QA.

The reason I did my method is that I was trying to follow an approach most closely approximating the extractive QA tutorial.

My ultimate objective is to compare the experience of using extractive QA vs FAQ-style QA, so I presumed that it would be apropos to have a bit of separation in the doc storage dataset.

Thank you!

Timoeller commented 3 years ago

Hey @aaronbriel cool that you like the dataset. We also have an accompanying ACL workshop paper.

index docs (and labels) into haystack

So you want to index the documents used into haystack and have used the "context" for each "paragraph" in the COVID-QA.json file. This is correct since these texts are exactly the associated papers.

You can directly use the COVID-QA.json file in haystacks Tutorial5_Evaluation.py once the PR deepset-ai/haystack/pull/494 is merged.

use the finder for asking single questions on the docs

If you have indexed the json file in haystacks document store you can then use the finder with any questions like:

prediction = finder.get_answers(question="Is this really working?", top_k_retriever=10, top_k_reader=5, index=doc_index)

So you said you want to compare extractive QA with FAQ based QA. Thats pretty cool. I presume that FAQ based QA is rather simple since you only match the incoming question with the questions from your database. Looking forward seeing your results there!

aaronbriel commented 3 years ago

@Timoeller Ah I'm embarrassed - I forgot about the context entries! I've been doing too many things at once apparently.. Thank you so much for the quick response!

aaronbriel commented 3 years ago

@Timoeller I'm seeing less than stellar results (at least for the few tests I've done) when I break up the contexts into separate files as opposed to using my prior approach of concatenating all questions and answers sequentially into a single document. For example, with the question "Where was COVID19 first discovered?" (a question directly pulled from COVID-QA.json), the latter correctly displays "Wuhan City, Hubei Province, China" as the result with highest probability (.78). However, with the contexts approach the highest probability result is "1971 by Theodor Diener" (.57).

This is using the deepset/roberta-base-squad2 that has been fine-tuned with the COVID-QA.json dataset. I would have expected this to increase performance, so it would seem that retrieval is the weak link here. Am I missing something?

aaronbriel commented 3 years ago

After further investigation into open haystack issues, I'm wondering if I'm running into the problem with roberta-base-squad2 as described here and here. Interestingly, however, the question above is towards the end of the document instead of the beginning. Either way, this is outside of the scope of the original issue so I will close and further investigate in the context of FARMReader and the model. Thanks!

Timoeller commented 3 years ago

Exactly, the bugs you linked are just minor things that shouldnt affect your use case.

Btw it is this model: https://huggingface.co/deepset/roberta-base-squad2-covid that was finetuned on covid qa. Not the plain "deepset/roberta-base-squad2" (missing "-covid" here)

aaronbriel commented 3 years ago

Right - I actually finetuned my own model with that same covid-qa data as I am creating different models that are trained on different sets of covid data for experimental purposes.

deepset-ai / COVID-QA

Document Retrieval for extractive QA with COVID-QA #108

index docs (and labels) into haystack

use the finder for asking single questions on the docs