deepset-ai / COVID-QA

API & Webapp to answer questions about COVID-19. Using NLP (Question Answering) and trusted data sources.
Apache License 2.0
344 stars 121 forks source link

Where do I get the document subset of Cord-19 used for covid-qa #182

Open jdpsen opened 1 year ago

jdpsen commented 1 year ago

The paper mentions "We selected 147 scientific articles mostly related to COVID-19 from the CORD-19" . How can I get the subset of documents to create an index ?

Timoeller commented 1 year ago

You can convert the QA dataset into the documents used. Here you find the QA dataset: https://github.com/deepset-ai/COVID-QA/blob/master/data/question-answering/200423_covidQA.json

In this JSON there are fields called "context" where the document texts are.

For what do you want to create an index? Are you using Haystack for creating a searchable index?