Raget: Possible miscalculation of all Ragas metrics, in particular Precision and Recall

Chabert-Liddell commented 6 months ago

Issue Type

Bug

Source

source

Giskard Library Version

2.11

Giskard Hub Version

-

OS Platform and Distribution

No response

Python version

No response

Installed python packages

No response

Current Behaviour?

Giskard RAGet uses the reference context when calling Ragas. 

https://github.com/Giskard-AI/giskard/blob/main/giskard/rag/metrics/ragas_metrics.py

        ragas_sample = {
            "question": question_sample["question"],
            "answer": answer,
            "contexts": question_sample["reference_context"].split("\n\n"),
            "ground_truth": question_sample["reference_answer"],
        }

According to Ragas documentation the retrieved context should be used (the one used for the answer Generation).

As an example, when computing Precision or Recall which both uses {"question", "contexts", "ground_truth"}, if you are giving the reference context, then you are evaluating your test set generation pipeline  and not your RAG pipeline.

Standalone code OR list down the steps to reproduce the issue

Relevant log output

No response

alexcombessie commented 6 months ago

@pierlj what do you think?

pierlj commented 6 months ago

Hi @Chabert-Liddell, you are right, thanks for pointing this out. A fix will be release soon!

Giskard-AI / giskard

Raget: Possible miscalculation of all Ragas metrics, in particular Precision and Recall #1924

Issue Type

Source

Giskard Library Version

Giskard Hub Version

OS Platform and Distribution

Python version

Installed python packages

Current Behaviour?

Standalone code OR list down the steps to reproduce the issue

Relevant log output