Maluuba / newsqa

Tools for using Maluuba's NewsQA Dataset (public version)
https://www.microsoft.com/en-us/research/project/newsqa-dataset/
Other
253 stars 58 forks source link

Qualifying for leaderboard entry #26

Closed alontalmor closed 5 years ago

alontalmor commented 5 years ago

Hi,

Thanks again for providing the new the JSON formatted output to this code.

How do we qualify for leaderboard entry? did the current leaderboard entries use examples in which the "consensus" field was 'noAnswer' or 'badQuestion' ?

(usually an evaluation script is provided with a dev, test predefined JSON)

Thanks Alon

juharris commented 5 years ago

Good question. Indeed we should provide an official evaluation script.

I just want to give you a quick answer now and I'll try to get confirmation. I'm pretty sure that only questions with consensus answers were used. So just to be clear, I'm pretty sure it is okay to ignore when "consensus" is {"badQuestion": true} or {"noAnswer": true}.

xingdi-eric-yuan commented 5 years ago

Hi @alontalmor , as @juharris said, only questions with consensus answers were used.

alontalmor commented 5 years ago

Hi, Thanks for the previous answer In the dev/ test set are all the answers given (that are not "badQuestion" or "noAnswer") considered correct, or just the consensus span? (does the model get credit for hitting one of the other answer spans? )

We are considering adding this dataset to a shared task. An official evaluation script would really help...

Thanks again, Alon

juharris commented 5 years ago

I'm pretty sure it's just the consensus span that is considered correct.