Closed SetonLiang closed 1 week ago
Yes, the result_files are the results generated by Step 3. Specifically, they are the answers provided by different RAG systems to the queries produced in Step 2. As for question 2, you can refer to Section 4.1 of the paper for more details.
Got it. So, should I understand that instead of considering the ground truth, the approach is to compare different answers across multiple dimensions and then calculate the win rate, using only the document content without (query, answer) pairs in the UltraDomain Benchmark dataset during evaluation?
You're correct. Our approach focuses on evaluating model performance across multiple dimensions. This method directly follows the GraphRAG.
How to get the score in Overall Performance Table by running example/batch_eval.py?