explodinggradients / ragas

Evaluation framework for your Retrieval Augmented Generation (RAG) pipelines
https://docs.ragas.io
Apache License 2.0
6.54k stars 640 forks source link

Can you provide human assessment data mentioned in RAGAS paper? #1063

Open awsvmaringa opened 2 months ago

awsvmaringa commented 2 months ago

Describe the Feature Can you could provide the human assessment data collected for bechmarking RAGAS metrics against human evaluations in your paper?

Why is the feature important for you? The paper only benchmarks ChatGPT against human evaluation. This feature would establish a standard dataset for benchmarking any LLM-as-judge models against human evaluation.

Additional context It would be great if you could provide a standard dataset containing question, ground truth, context, human labels for benchmarking all RAGAS metrics for different judge models.

jjmachan commented 1 month ago

@awsvmaringa sorry for the delay but are you still looking for it?