deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.59k stars 1.91k forks source link

Write proposal for presentation of evaluation results #7398

Closed mrm1001 closed 7 months ago

mrm1001 commented 7 months ago

User stories:

Example in other libraries:

More context here: https://www.notion.so/deepsetai/Evaluation-1521712b928d4142828232f2df136856?pvs=4

davidsbatista commented 7 months ago

first draft of the proposal is here

https://www.notion.so/deepsetai/Proposal-for-presentation-of-Evaluation-Results-4b1063a227ad4e08950f184c014063da

mrm1001 commented 7 months ago

Thanks so much @davidsbatista! Great appraisal!

The metrics we will be implementing in 2.1.0 in Haystack are here and they are basically:

I was wondering whether you could write one final recommendation on what you think the evaluation metrics that we're implementing in Haystack should return? I'm making it up, but something like this:

Aggregate: {SAS={mean: 0.9}, context_relevance: {mean: 0.75}, recall_single: {mean: 0.5}, recall_multi: {mean: 0.6}, faithfulness: {mean: 0.9}

Single: (query1, answer_1}: SAS: 0.8, recall_single: 1, recall_multi: 0.8, context_relevance: 0.9, faithfulness: 0.8

And then the user can aggregate across queries if they want something different from the "mean".

davidsbatista commented 7 months ago

Thanks @mrm1001 - I've updated the page

mrm1001 commented 7 months ago

Thanks @davidsbatista , I consider this done, so feel free to close it.

julian-risch commented 7 months ago

@davidsbatista I suggest that you close the issue only once the proposal has been reviewed and added to the GitHub repo in this proposals folder: https://github.com/deepset-ai/haystack/tree/main/proposals

davidsbatista commented 7 months ago

thanks for the suggestion @julian-risch - this is indeed a more structured way to present the proposal.

I've added it here: https://github.com/deepset-ai/haystack/pull/7462/files

I cut a bit on many of the ideas and tried to keep it simple and working with PoC code. We can than re-iterate and add other ideas.