Open YaooXu opened 7 months ago
Thanks for sharing the codes. But I have some questions in the calcution of HIT@1. In the evaluate_for_webqsp.py, the codes are as follow:
evaluate_for_webqsp.py
for ans in answers: ans = ans.lower() if ans in pred: hit_flag.append(1)
It seems that you don't choose the first answer from the LLM's response, so the score is not accurate when LLMs predict many answers?
Thanks for sharing the codes. But I have some questions in the calcution of HIT@1. In the
evaluate_for_webqsp.py
, the codes are as follow:It seems that you don't choose the first answer from the LLM's response, so the score is not accurate when LLMs predict many answers?