Closed TuanaCelik closed 2 months ago
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:45Z ----------------------------------------------------------------
...model-based evaluation frameworks integerated -> integrated
Also, in the Goal field, you can add more information about the evaluation methods we used in the tutorial. As I understand, we use both model-based evaluators and statistical evaluators
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------
You will run your RAG pipeline and evaluated -> evaluate
We will use some of the available evalution -> evaluation
Why do the evaluation docs link have v2.1-unstable/
path?
TuanaCelik commented on 2024-04-30T11:16:10Z ----------------------------------------------------------------
Because that's the only live one right now. Will change when released.
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:46Z ----------------------------------------------------------------
You can remove /2.0/
paths from these links too
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:47Z ----------------------------------------------------------------
Line #3. pip install git+https://github.com/deepset-ai/haystack.git@main
Do we need to install it from main?
right now yes, will change when released
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:48Z ----------------------------------------------------------------
First, let's actually tun -> run
You will notice that this is why we provide a list od -> of
TuanaCelik commented on 2024-04-30T11:16:45Z ----------------------------------------------------------------
thanks 🙏
View / edit / reply to this conversation on ReviewNB
bilgeyucel commented on 2024-04-30T09:14:49Z ----------------------------------------------------------------
Line #7. eval_pipeline.add_component("groundness_evaluator", FaithfulnessEvaluator())
The name is "groundness_evaluator" but the component is "FaithfulnessEvaluator". Are they the same thing? Or maybe I'm missing something
We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?
_julian-risch commented on 2024-04-30T09:55:02Z_ ----------------------------------------------------------------AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.
View / edit / reply to this conversation on ReviewNB
julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------
Let's rename the components as their names will be used as column names:
from haystack.components.evaluators.document_mrr import DocumentMRREvaluator from haystack.components.evaluators.faithfulness import FaithfulnessEvaluator from haystack.components.evaluators.sas_evaluator import SASEvaluator eval_pipeline = Pipeline() eval_pipeline.add_component("mean_reciprocal_rank", DocumentMRREvaluator()) eval_pipeline.add_component("faithfulness", FaithfulnessEvaluator()) eval_pipeline.add_component("semantic_answer_similarity", SASEvaluator(model="sentence-transformers/all-MiniLM-L6-v2")) results = eval_pipeline.run({ "mean_reciprocal_rank": {"ground_truth_documents": list([d] for d in ground_truth_docs) , "retrieved_documents": retrieved_docs}, "faithfulness": {"questions": list(questions), "contexts": list([d.content] for d in ground_truth_docs), "responses": rag_answers}, "semantic_answer_similarity": {"predicted_answers": rag_answers, "ground_truth_answers": list(ground_truth_answers)} })
View / edit / reply to this conversation on ReviewNB
julian-risch commented on 2024-04-30T09:27:28Z ----------------------------------------------------------------
This cell can be simplified to
from haystack.evaluation.eval_run_result import EvaluationRunResult inputs = { "question": list(questions), "contexts": list([d.content] for d in ground_truth_docs), "answer": list(ground_truth_answers), "predicted_answer": rag_answers, } evaluation_result = EvaluationRunResult(run_name="pubmed_rag_pipeline", inputs=inputs, results=results) evaluation_result.score_report()
We can rename the component to AnswerGroundednessEvaluator if that is more intuitive?
View entire conversation on ReviewNB
AnswerFaithfulnessEvaluator or AnswerHallucinationEvaluator are alternatives.
View entire conversation on ReviewNB
Because that's the only live one right now. Will change when released.
View entire conversation on ReviewNB
@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter responses
was renamed to predicted_answers
. And if you think we should rename the component we can still do it, just let me know.
@TuanaCelik The only change needed in this tutorial that is caused by the PR I just merged is that FaithfulnessEvaluator's input parameter
responses
was renamed topredicted_answers
. And if you think we should rename the component we can still do it, just let me know.
Hey @julian-risch - thanks for the info. I don't have strong opinions on the component naming. Imo, 'faithfulness' is widely used at this point. I'll defer to you guys to make the final call here. I'm ok with either
Comments are resolved. For whoever merging: ~- Update the installation to haystack-ai after release~
Check out this pull request on
See visual diffs & provide feedback on Jupyter Notebooks.
Powered by ReviewNB