Test the evaluation harness on RAG parameter tuning

deepset-ai / haystack

:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

https://haystack.deepset.ai

Apache License 2.0

17k stars 1.86k forks source link

Test the evaluation harness on RAG parameter tuning #7825

Closed mrm1001 closed 3 months ago

mrm1001 commented 4 months ago

Context

The evaluation harness is going to be available for our users in the haystack-experimental package. In order for us to decide whether we want to keep this feature or not, and whether it needs any improvements, we would like to first test it internally before we collect feedback from our users.

Task description

Use the evaluation harness to optimise a RAG pipeline using parameter search.

Task outcome

Discover and find any bugs or UX issues
Create shareable code that can later be used to get feedback from users. This code will live in the evaluation repository.

davidsbatista commented 3 months ago

The Harness is not the best tool for parameter search and much more a beginner level tool - I used it to showcase Basic RAG vs HyDE approach, and in the process reported a few bugs and issues which were and are being taken care of.

davidsbatista commented 3 months ago

a working example with the Harness, showcasing a baseline RAG vs the HyDE technique is referenced here:

https://github.com/deepset-ai/haystack-evaluation/tree/main/evaluations

and the full code here

https://github.com/deepset-ai/haystack-evaluation/blob/main/evaluations/evaluation_aragog_harness.py