Annotation methodology of QA resources

lintool commented 4 years ago

Hi there, thanks for sharing your QA resource! https://github.com/deepset-ai/COVID-QA/tree/master/data/question-answering

I was wondering if you have a write-up of the annotation methodology? For example, how were the documents selected, how were the questions generated, guidelines for marking the extent of the spans, etc.

Thanks in advance!

Timoeller commented 4 years ago

Hey @lintool thanks for looking into the annotations we open sourced. We really liked your work on BERTserini and How Dirk used OSS frameworks for a Cord 19 semantic search. Currently we are also working on better retrievers in our semantic search framework haystack.

About your question:

We have been using our own SQuAD-style annotation tool where annotators read a document, formulate questions about the content and highlight corresponding answers. Here you find an introductory video into the label tool and annotation process.
Annotations are done on a volunteering basis by medical experts (MSc or higher) and we are especially grateful to Anthony Reina for on-boarding new annotators and supervising the process.
The documents are a subset of CORD-19 papers that annotators deemed related to Covid. (Hopefully Tony can give more insights into the process?)

Can we somehow assist you in using these labels?

lintool commented 4 years ago

Hi @Timoeller - Thanks for your response. We've been working on building test collections also, but via slightly different approach: https://arxiv.org/abs/2004.11339

I was wondering if you'd be interested in more closely coordinating efforts? If so, let's connect directly over email?

tonyreina commented 4 years ago

Yes. We'd love to coordinate our efforts. Please reach out directly to either me (Tony) or Timo. Thanks so much.

lintool commented 4 years ago

What's your email? Or you can find mine on my website: https://cs.uwaterloo.ca/~jimmylin/index.html

deepset-ai / COVID-QA

Annotation methodology of QA resources #103