RichardHGL / WSDM2021_NSM

Improving Multi-hop Knowledge Base Question Answering by Learning Intermediate Supervision Signals. WSDM 2021.
130 stars 22 forks source link

CWQ result difference between leaderboard and paper #22

Closed PlusRoss closed 2 years ago

PlusRoss commented 2 years ago

Hi,

Great work! May I ask why the CWQ result in the paper is 48.8 while it is 53.9 on the leaderboard?

Thanks.

RichardHGL commented 2 years ago

There are some differences between the evaluation of this repo and leaderboard. In leaderboard, we submit the result in text. However, in this repo we only compare the predicted ID with gold answers. When submitting the text answers, the evaluation may show better results. Our test set answers is from https://github.com/lanyunshi/KBQA-GST instead of the gold answers (the dataset authors don't release gold answers on test set)

PlusRoss commented 2 years ago

I see. Does this mean even if the predicted id does not match the gold answer, the corresponding text may still match because different ids can yield the same entity name? Or your test set answer is just a subset of the gold answers?

RichardHGL commented 2 years ago

I guess the former.

PlusRoss commented 2 years ago

I see. Thanks for your reply!