Closed alontalmor closed 5 years ago
Good question. Indeed we should provide an official evaluation script.
I just want to give you a quick answer now and I'll try to get confirmation. I'm pretty sure that only questions with consensus answers were used. So just to be clear, I'm pretty sure it is okay to ignore when "consensus" is {"badQuestion": true}
or {"noAnswer": true}
.
Hi @alontalmor , as @juharris said, only questions with consensus answers were used.
Hi, Thanks for the previous answer In the dev/ test set are all the answers given (that are not "badQuestion" or "noAnswer") considered correct, or just the consensus span? (does the model get credit for hitting one of the other answer spans? )
We are considering adding this dataset to a shared task. An official evaluation script would really help...
Thanks again, Alon
I'm pretty sure it's just the consensus span that is considered correct.
Hi,
Thanks again for providing the new the JSON formatted output to this code.
How do we qualify for leaderboard entry? did the current leaderboard entries use examples in which the "consensus" field was 'noAnswer' or 'badQuestion' ?
(usually an evaluation script is provided with a dev, test predefined JSON)
Thanks Alon