Sentence Transformers embedders: support embedding quantization and truncation for Matryoshka embeddings

deepset-ai / haystack

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

https://haystack.deepset.ai

Apache License 2.0

16.58k stars 1.82k forks source link

Sentence Transformers embedders: support embedding quantization and truncation for Matryoshka embeddings #8040

Closed anakin87 closed 4 weeks ago

anakin87 commented 1 month ago

The Sentence Transformers library recently introduced new features that would be nice and low-effort to support.

Embedding quantization - via precision parameter during inference: https://www.sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#sentence_transformers.SentenceTransformer.encode
Truncate embeddings for models trained with Matryoshka Representation Learning - via truncate_dim init parameter: https://sbert.net/docs/package_reference/sentence_transformer/SentenceTransformer.html#id1

nickprock commented 1 month ago

Hi @anakin87 , can I work on this issue?

anakin87 commented 1 month ago

@nickprock That would be great.

I would split the work into 2 different PRs Feel free to discuss it here if anything is unclear (e.g. about embedding backend).

nickprock commented 1 month ago

Hi @anakin87 I'm starting with point 2. Would you rather have truncate_dim go into the Sentence transformers backend and be part of the backend_id? Or leave it out of the backend and just insert it as a parameter for the two embedders (text and document)?

anakin87 commented 1 month ago

Since truncate_dim is part of the __init__ of the SentenceTransformerclass, I would put it in the __init__ of the components and also in the backend.

(@sjrl you may be interested in this conversation)

sjrl commented 1 month ago

If you’d also be willing to just expose all possible init parameters of the SentenceTransformers class that would also be great! I talk about this in this issue https://github.com/deepset-ai/haystack/issues/8055 where other init params would be helpful.

anakin87 commented 1 month ago

@sjrl in general, I agree with adding as many parameters as we find useful. But TBH, I would prefer to solve #8055 in another PR, since it seems to involve particular serialization aspects (e.g., with respect to torch_dtype in model_kwargs).

sjrl commented 1 month ago

@anakin87 we have the serialization figured out for model kwargs already in some components like the Transformers Ranker component so we could reuse that.