:mag: AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
Common metrics for evaluating QA like Exact Match or f1 score are very strict. They can be better applied to settings where an exact entity (name, date or number) needs to be extracted.
For more complex answers we want a more loose evaluation.
Inspired by other domains like machine translation we have started experiments with a semantic textual similarity metric between ground truth answer and predicted answer in https://github.com/deepset-ai/FARM/pull/803
We now want to bring this functionality to Haystack.
Common metrics for evaluating QA like Exact Match or f1 score are very strict. They can be better applied to settings where an exact entity (name, date or number) needs to be extracted. For more complex answers we want a more loose evaluation.
Inspired by other domains like machine translation we have started experiments with a semantic textual similarity metric between ground truth answer and predicted answer in https://github.com/deepset-ai/FARM/pull/803
We now want to bring this functionality to Haystack.
Prioritize pipeline eval.