Can you provide human assessment data mentioned in RAGAS paper?

Describe the Feature Can you could provide the human assessment data collected for bechmarking RAGAS metrics against human evaluations in your paper?

Why is the feature important for you? The paper only benchmarks ChatGPT against human evaluation. This feature would establish a standard dataset for benchmarking any LLM-as-judge models against human evaluation.

Additional context It would be great if you could provide a standard dataset containing question, ground truth, context, human labels for benchmarking all RAGAS metrics for different judge models.

explodinggradients / ragas

Can you provide human assessment data mentioned in RAGAS paper? #1063