deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.72k stars 1.92k forks source link

Incompatibility Between Haystack 2.4.0 and Sentence-Transformers 2.x Due to `model_kwargs` Argument #8248

Closed movchan74 closed 2 months ago

movchan74 commented 2 months ago

Describe the bug Haystack 2.4.0 specifies a dependency on sentence-transformers version >=2.3.0. However, the SentenceTransformersTextEmbedder component uses the model_kwargs argument, which is not supported in sentence-transformers versions below 3.0. As a result, initializing the SentenceTransformersTextEmbedder with sentence-transformers 2.x (e.g., 2.7.0) leads to a TypeError.

Error message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/haystack_bug/venv/lib/python3.10/site-packages/haystack/components/embedders/sentence_transformers_text_embedder.py", line 167, in warm_up
    self.embedding_backend = _SentenceTransformersEmbeddingBackendFactory.get_embedding_backend(
  File "/root/haystack_bug/venv/lib/python3.10/site-packages/haystack/components/embedders/backends/sentence_transformers_backend.py", line 37, in get_embedding_backend
    embedding_backend = _SentenceTransformersEmbeddingBackend(
  File "/root/haystack_bug/venv/lib/python3.10/site-packages/haystack/components/embedders/backends/sentence_transformers_backend.py", line 66, in __init__
    self.model = SentenceTransformer(
TypeError: SentenceTransformer.__init__() got an unexpected keyword argument 'model_kwargs'

Expected behavior The SentenceTransformersTextEmbedder should initialize correctly without raising an exception. The dependency on sentence-transformers should either be fixed to ensure compatibility with the model_kwargs argument (i.e., sentence-transformers>=3.0.0) or the code should be adjusted to work with sentence-transformers 2.x.

Additional context This issue occurs because the current dependency in the pyproject.toml file allows the installation of sentence-transformers versions that do not support the model_kwargs argument. This bug can be resolved by either upgrading the dependency to sentence-transformers>=3.0.0 or removing the use of model_kwargs in Haystack 2.4.0.

To Reproduce

  1. Install sentence-transformers version 2.x (e.g., 2.7.0) and haystack-ai version 2.4.0:
    pip install sentence-transformers==2.7.0
    pip install haystack-ai==2.4.0
  2. Run the following code:
    from haystack.components.embedders import SentenceTransformersTextEmbedder
    text_embedder = SentenceTransformersTextEmbedder()
    text_embedder.warm_up()
  3. Observe the TypeError due to the unsupported model_kwargs argument.

FAQ Check

System:

Suggested Fix Update the pyproject.toml to require sentence-transformers>=3.0.0 to ensure compatibility with the model_kwargs argument. Alternatively, modify the code in Haystack 2.4.0 to avoid using model_kwargs when working with sentence-transformers versions below 3.0.

julian-risch commented 2 months ago

Thank you @movchan74 for bringing this to our attention! We will discuss it in our team in the next sprint planning.

anakin87 commented 2 months ago

I would say that sentence-transformers is not a core dependency of haystack-ai (we wrap imports in a lazy import block). Instead, it is a test dependency (part of the [tool.hatch.envs.test] block in pyproject.toml).

For this reason, we can't force users to install a specific version. However, we should

anakin87 commented 2 months ago

done in #8295, https://github.com/deepset-ai/haystack-tutorials/pull/345, and https://github.com/deepset-ai/haystack-cookbook/pull/113