deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.72k stars 1.92k forks source link

feat (v2): Update so `model_max_length` updates `max_seq_length` for Sentence Transformers #8334

Closed sjrl closed 2 months ago

sjrl commented 2 months ago

Related Issues

Proposed Changes:

This brings a change we had in v1 that updates the Sentence Transformer specific variable max_seq_length that allows us to change the max sequence length. Previously we though using model_max_length through tokenizer_kwargs would fix this, but a recent inveistgation from @bglearning shows it does not. So we make an update here such that model_max_length updates the value of max_seq_length.

How did you test it?

Notes for the reviewer

Checklist

coveralls commented 2 months ago

Pull Request Test Coverage Report for Build 10723664490

Details


Files with Coverage Reduction New Missed Lines %
components/embedders/sentence_transformers_document_embedder.py 1 96.67%
components/embedders/sentence_transformers_text_embedder.py 1 96.15%
<!-- Total: 2 -->
Totals Coverage Status
Change from base Build 10705409434: 0.005%
Covered Lines: 7021
Relevant Lines: 7782

💛 - Coveralls