Closed PlusRoss closed 2 years ago
There are some differences between the evaluation of this repo and leaderboard. In leaderboard, we submit the result in text. However, in this repo we only compare the predicted ID with gold answers. When submitting the text answers, the evaluation may show better results. Our test set answers is from https://github.com/lanyunshi/KBQA-GST instead of the gold answers (the dataset authors don't release gold answers on test set)
I see. Does this mean even if the predicted id does not match the gold answer, the corresponding text may still match because different ids can yield the same entity name? Or your test set answer is just a subset of the gold answers?
I guess the former.
I see. Thanks for your reply!
Hi,
Great work! May I ask why the CWQ result in the paper is 48.8 while it is 53.9 on the leaderboard?
Thanks.