The calculation of HIT@1 score

JBoRu / StructGPT

The code and data for "StructGPT: A general framework for Large Language Model to Reason on Structured Data"

Apache License 2.0

97 stars 39 forks source link

Open YaooXu opened 7 months ago

YaooXu commented 7 months ago

Thanks for sharing the codes. But I have some questions in the calcution of HIT@1. In the evaluate_for_webqsp.py, the codes are as follow:

for ans in answers:
    ans = ans.lower()
    if ans in pred:
        hit_flag.append(1)

It seems that you don't choose the first answer from the LLM's response, so the score is not accurate when LLMs predict many answers?