HotpotQA Performance - Githubissues

Hi,

I am sorry if this is a stupid question. I have read the paper carefully and usually, the performance on the hotpot dataset is around 20 percent, or less when trained on other datasets. This repository mentions in ./models/README.md BERT would achieve 53 percent exact matches, and indeed downloading the HotpotQA data linked in this repository, converting it to squad format and training BERT results in a similar performance. Does the development set linked in this repository differ from the one used in the paper?

Thanks for your answer in advance.

alontalmor / MultiQA

HotpotQA Performance #7