deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
16.99k stars 1.86k forks source link

`comparative_individual_scores_report` should keep the results of the two pipelines being compared #7969

Closed mrm1001 closed 3 months ago

mrm1001 commented 3 months ago

When running two pipelines and comparing their results to each other, I would like to see the predicted answers of each pipeline run in the resulting pandas dataframe.

Here is an example of how this is done: https://github.com/deepset-ai/haystack-evaluation/blob/1f2747ec59101231c9857fd2b07948c87ed9d181/evaluations/evaluation_sentence_window_retrieval.py#L123

When I run this script, the resulting csv has these columns:

'Unnamed: 0', 'questions', 'contexts', 'true_answers',
       'predicted_answers', 'base-rag_context_relevance',
       'base-rag_faithfulness', 'base-rag_sas',
       'window-retrieval_context_relevance', 'window-retrieval_faithfulness',
       'window-retrieval_sas'

Instead of predicted_answers, it should have: base-rag_predicted_answers and window-retrieval_predicted_answers.

davidsbatista commented 3 months ago

this issue was already fixed in this PR: https://github.com/deepset-ai/haystack/pull/7879 - I've added a parameter keep_columns to handle this exact case