feat (v2): Update so `model_max_length` updates `max_seq_length` for Sentence Transformers

deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Apache License 2.0

17.72k stars 1.92k forks source link

Related Issues

fixes #issue-number

Proposed Changes:

This brings a change we had in v1 that updates the Sentence Transformer specific variable max_seq_length that allows us to change the max sequence length. Previously we though using model_max_length through tokenizer_kwargs would fix this, but a recent inveistgation from @bglearning shows it does not. So we make an update here such that model_max_length updates the value of max_seq_length.

How did you test it?

Notes for the reviewer

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

Pull Request Test Coverage Report for Build 10723664490

Details

0 of 0 changed or added relevant lines in 0 files are covered.

2 unchanged lines in 2 files lost coverage.

Overall coverage increased (+0.005%) to 90.221%

Files with Coverage Reduction	New Missed Lines	%
components/embedders/sentence_transformers_document_embedder.py	1	96.67%
components/embedders/sentence_transformers_text_embedder.py	1	96.15%
<!--	Total:	2	-->

Files with Coverage Reduction

New Missed Lines

components/embedders/sentence_transformers_document_embedder.py

96.67%

components/embedders/sentence_transformers_text_embedder.py

96.15%

<!--

Total:

-->

Totals
Change from base Build 10705409434:	0.005%
Covered Lines:	7021
Relevant Lines:	7782

Totals

Change from base Build 10705409434:

0.005%

Covered Lines:

7021

Relevant Lines:

7782

deepset-ai / haystack