Giskard-AI / giskard

🐢 Open-Source Evaluation & Testing for ML & LLM systems
https://docs.giskard.ai
Apache License 2.0
4.07k stars 267 forks source link

Raget: Possible miscalculation of all Ragas metrics, in particular Precision and Recall #1924

Closed Chabert-Liddell closed 5 months ago

Chabert-Liddell commented 6 months ago

Issue Type

Bug

Source

source

Giskard Library Version

2.11

Giskard Hub Version

-

OS Platform and Distribution

No response

Python version

No response

Installed python packages

No response

Current Behaviour?

Giskard RAGet uses the reference context when calling Ragas. 

https://github.com/Giskard-AI/giskard/blob/main/giskard/rag/metrics/ragas_metrics.py

        ragas_sample = {
            "question": question_sample["question"],
            "answer": answer,
            "contexts": question_sample["reference_context"].split("\n\n"),
            "ground_truth": question_sample["reference_answer"],
        }

According to Ragas documentation the retrieved context should be used (the one used for the answer Generation).

As an example, when computing Precision or Recall which both uses {"question", "contexts", "ground_truth"}, if you are giving the reference context, then you are evaluating your test set generation pipeline  and not your RAG pipeline.

Standalone code OR list down the steps to reproduce the issue

.

Relevant log output

No response

alexcombessie commented 6 months ago

@pierlj what do you think?

pierlj commented 6 months ago

Hi @Chabert-Liddell, you are right, thanks for pointing this out. A fix will be release soon!