deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.72k stars 1.92k forks source link

feat: Add `min_top_k` to TopPSampler #8228

Closed sjrl closed 2 months ago

sjrl commented 3 months ago

Related Issues

Proposed Changes:

This adds the parameter min_top_k to the TopPSampler which sets the minimum number of documents to be returned when the top-p sampling algorithm results in fewer documents being selected. The documents with the next highest scores are added to the selection.

This is useful when we want to guarantee a set number of documents will always be passed on, but allow the Top-P algorithm to still determine if more documents should be sent based on document score.

Also some minor refactoring was done. The biggest change/addition was changing ComponentErrors to logger.warning messages to allow the component to still run even when running into small edge cases.

How did you test it?

Expanded tests

Notes for the reviewer

Checklist

coveralls commented 3 months ago

Pull Request Test Coverage Report for Build 10485448854

Details


Totals Coverage Status
Change from base Build 10419214121: 0.06%
Covered Lines: 6959
Relevant Lines: 7716

💛 - Coveralls
anakin87 commented 2 months ago

@sjrl I have pushed a small change to fix typing issues. Please take a look and confirm it is ok. Let's wait for the docstrings review...

sjrl commented 2 months ago

@sjrl I have pushed a small change to fix typing issues. Please take a look and confirm it is ok. Let's wait for the docstrings review...

Thanks! Looks good, I only made one small change to the docstring