Request code on evaluation

Teddy-XiongGZ / MedRAG

Code for the MedRAG toolkit

https://teddy-xionggz.github.io/benchmark-medical-rag/

Other

157 stars 25 forks source link

Request code on evaluation #15

Closed PeterGriffinJin closed 1 month ago

PeterGriffinJin commented 1 month ago

Hi Guangzhi,

Thank you for your great work!

Can I request your code on how to calculate the generation accuracy? Do you conduct substring match to find if ground truth answer occurs in the generated answer or do you adopt other evaluation methods, e.g., NLI/classifier/LLM evaluator to calculate the accuracy?

Best, Bowen

Teddy-XiongGZ commented 1 month ago

Hi Bowen,

We did conduct substring match to find the predicted answer. You can find the relevant code here:

https://github.com/Teddy-XiongGZ/MIRAGE/blob/main/src/evaluate.py#L26
https://github.com/Teddy-XiongGZ/MIRAGE/blob/main/src/utils.py#L27

Best, Guangzhi