How to construct the Overall Performance Table in example/batch_eval.py?

HKUDS / LightRAG

"LightRAG: Simple and Fast Retrieval-Augmented Generation"

https://arxiv.org/abs/2410.05779

MIT License

7.27k stars 810 forks source link

How to construct the Overall Performance Table in example/batch_eval.py? #124

Closed SetonLiang closed 1 week ago

SetonLiang commented 1 week ago

How to get the score in Overall Performance Table by running example/batch_eval.py?

How are the result_files obtained? Are they obtained by querying different classes (in reproduce/Step_3.py)?
Why is it unnecessary to consider the retrieval context and ground truth ? And what's the ground_truth of the generated queries?

LarFii commented 1 week ago

Yes, the result_files are the results generated by Step 3. Specifically, they are the answers provided by different RAG systems to the queries produced in Step 2. As for question 2, you can refer to Section 4.1 of the paper for more details.

SetonLiang commented 1 week ago

Got it. So, should I understand that instead of considering the ground truth, the approach is to compare different answers across multiple dimensions and then calculate the win rate, using only the document content without (query, answer) pairs in the UltraDomain Benchmark dataset during evaluation?

LarFii commented 1 week ago

You're correct. Our approach focuses on evaluating model performance across multiple dimensions. This method directly follows the GraphRAG.