`comparative_individual_scores_report` should keep the results of the two pipelines being compared

deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Apache License 2.0

16.99k stars 1.86k forks source link

When running two pipelines and comparing their results to each other, I would like to see the predicted answers of each pipeline run in the resulting pandas dataframe.

Here is an example of how this is done: https://github.com/deepset-ai/haystack-evaluation/blob/1f2747ec59101231c9857fd2b07948c87ed9d181/evaluations/evaluation_sentence_window_retrieval.py#L123

When I run this script, the resulting csv has these columns:

'Unnamed: 0', 'questions', 'contexts', 'true_answers',
       'predicted_answers', 'base-rag_context_relevance',
       'base-rag_faithfulness', 'base-rag_sas',
       'window-retrieval_context_relevance', 'window-retrieval_faithfulness',
       'window-retrieval_sas'

Instead of predicted_answers, it should have: base-rag_predicted_answers and window-retrieval_predicted_answers.

deepset-ai / haystack

`comparative_individual_scores_report` should keep the results of the two pipelines being compared #7969