deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.72k stars 1.92k forks source link

feat: Add a calibration factor to `TransformersSimilarityRanker` to increase spread of scores #8221

Closed sjrl closed 2 months ago

sjrl commented 3 months ago

Is your feature request related to a problem? Please describe. I would like to add a calibration_factor similar as to the ExtractiveReader https://github.com/deepset-ai/haystack/blob/5ac56ebdaf18c6f1fefb63098c69a96b579181eb/haystack/components/readers/extractive.py#L66

to the TransformersSimilarityRanker to similarly calibrate the final scores (aka probabilities) output by the ranker. This could be helpful because we find in practice a lot of Cross-Encoder models output scores in a very narrow range (e.g. between 0.3-0.5) instead of utilizing the full 0 to 1 range of the underlying sigmoid function.

This could be helpful in combination with other components like TopPSampler which rely on there being a larger spread in scores.

MetroCat69 commented 2 months ago

can I get this issue?