alontalmor / MultiQA

138 stars 23 forks source link

HotpotQA Performance #7

Closed bernhard2202 closed 5 years ago

bernhard2202 commented 5 years ago

Hi,

I am sorry if this is a stupid question. I have read the paper carefully and usually, the performance on the hotpot dataset is around 20 percent, or less when trained on other datasets. This repository mentions in ./models/README.md BERT would achieve 53 percent exact matches, and indeed downloading the HotpotQA data linked in this repository, converting it to squad format and training BERT results in a similar performance. Does the development set linked in this repository differ from the one used in the paper?

Thanks for your answer in advance.

alontalmor commented 5 years ago

Hi, that’s a good question. HotpotQA has 2 versions: distractor setting and full-wiki. Our article results are on full wiki and the results here are on distractor setting. I will update both results here soon. Hope this helps, Alon