IDEA-FinAI / ToG

This is the official github repo of Think-on-Graph. If you are interested in our work or willing to join our research team in Shenzhen, please feel free to contact us by email (xuchengjin@idea.edu.cn)
238 stars 26 forks source link

Low output score - reasoning output written to final output #29

Open devishree23 opened 1 week ago

devishree23 commented 1 week ago

I am trying to reproduce your results from the paper. I am using the Llama3 70B GPTQ model for WebQSP dataset with Freebase KG. However, I am getting much lower results than ones reported in the paper. I got an exact match score of just 0.189.

Though one reason for the difference in scores could be due to the LLM used but based on the error analysis we performed, it seems that the reasoning done using LLM is also being written to the final output score. Is this by design or is it because it is a bug? Most of the output from reasoning is just "yes" or "no" but it doesn't contain the answer to the question. In the reasoning chains however, we see the required answer is derived from the KG.

Please let us know your thoughts. Any help would be appreciated. Thank you