Update the eval tutorial

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:45Z ----------------------------------------------------------------

...model-based evaluation frameworks integerated -> integrated

Also, in the Goal field, you can add more information about the evaluation methods we used in the tutorial. As I understand, we use both model-based evaluators and statistical evaluators

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------

You will run your RAG pipeline and evaluated -> evaluate

We will use some of the available evalution -> evaluation

Why do the evaluation docs link have v2.1-unstable/ path?

TuanaCelik commented on 2024-04-30T11:16:10Z ----------------------------------------------------------------

Because that's the only live one right now. Will change when released.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------

You can remove /2.0/ paths from these links too

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:47Z ----------------------------------------------------------------

Line #3.    pip install git+https://github.com/deepset-ai/haystack.git@main

Do we need to install it from main?

_TuanaCelik commented on 2024-04-30T11:16:23Z_ ----------------------------------------------------------------

right now yes, will change when released

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:48Z ----------------------------------------------------------------

First, let's actually tun -> run

You will notice that this is why we provide a list od -> of

TuanaCelik commented on 2024-04-30T11:16:45Z ----------------------------------------------------------------

thanks 🙏

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

bilgeyucel commented on 2024-04-30T09:14:49Z ----------------------------------------------------------------

Line #7.    eval_pipeline.add_component("groundness_evaluator", FaithfulnessEvaluator())

The name is "groundness_evaluator" but the component is "FaithfulnessEvaluator". Are they the same thing? Or maybe I'm missing something

_julian-risch commented on 2024-04-30T09:53:58Z_ ----------------------------------------------------------------

We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?

_julian-risch commented on 2024-04-30T09:55:02Z_ ----------------------------------------------------------------

AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------

Let's rename the components as their names will be used as column names:

from haystack.components.evaluators.document_mrr import DocumentMRREvaluator
from haystack.components.evaluators.faithfulness import FaithfulnessEvaluator
from haystack.components.evaluators.sas_evaluator import SASEvaluator

eval_pipeline = Pipeline()
eval_pipeline.add_component("mean_reciprocal_rank", DocumentMRREvaluator())
eval_pipeline.add_component("faithfulness", FaithfulnessEvaluator())
eval_pipeline.add_component("semantic_answer_similarity", SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2"))

results = eval_pipeline.run({
    "mean_reciprocal_rank": {"ground_truth_documents": list([d] for d in ground_truth_docs) , "retrieved_documents": retrieved_docs},
    "faithfulness": {"questions": list(questions), "contexts": list([d.content] for d in ground_truth_docs), "responses": rag_answers},
    "semantic_answer_similarity": {"predicted_answers": rag_answers, "ground_truth_answers": list(ground_truth_answers)}
})

review-notebook-app[bot] commented 2 months ago

View / edit / reply to this conversation on ReviewNB

julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------

This cell can be simplified to

from haystack.evaluation.eval_run_result import EvaluationRunResult

inputs = {
        "question": list(questions),
        "contexts": list([d.content] for d in ground_truth_docs),
        "answer": list(ground_truth_answers),
        "predicted_answer": rag_answers,
    }
evaluation_result = EvaluationRunResult(run_name="pubmed_rag_pipeline", inputs=inputs, results=results)
evaluation_result.score_report()

julian-risch commented 2 months ago

We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?

View entire conversation on ReviewNB

julian-risch commented 2 months ago

AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.

View entire conversation on ReviewNB

TuanaCelik commented 2 months ago

Because that's the only live one right now. Will change when released.

View entire conversation on ReviewNB

TuanaCelik commented 2 months ago

right now yes, will change when released

View entire conversation on ReviewNB

TuanaCelik commented 2 months ago

thanks 🙏

View entire conversation on ReviewNB

julian-risch commented 2 months ago

@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter responses was renamed to predicted_answers. And if you think we should rename the component we can still do it, just let me know.

TuanaCelik commented 2 months ago

@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter responses was renamed to predicted_answers. And if you think we should rename the component we can still do it, just let me know.

Hey @julian-risch - thanks for the info. I don't have strong opinions on the component naming. Imo, 'faithfulness' is widely used at this point. I'll defer to you guys to make the final call here. I'm ok with either

TuanaCelik commented 2 months ago

Comments are resolved. For whoever merging: ~- Update the installation to haystack-ai after release~

Check that the image is rendered correctly on the website and if not, update it to the raw github url

deepset-ai / haystack-tutorials

Update the eval tutorial #318