deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.75k stars 1.92k forks source link

ChromaDB stuck #8284

Closed alecrimi closed 2 months ago

alecrimi commented 2 months ago

Hi, I am hitting conflicting dependencies using haystack-chroma. For some libraries I cannot use anything younger than Python 3.9, and I have avoided Python 3.10 as I have seen some stuff I use it is not yet ready. So, the issues are on Python 3.9. I think the main problem is the need to install farm-haystack[inference] & chroma-haystack

As chroma-haystack pulls also haystack-ai which is iconpatible with farm-haystack. In Conda env I have chroma-haystack 0.21.1 farm-haystack 1.26.2 ChromaDocumentStore Haystack EmbeddingRetriever.

However, it get stuck already at the import of EmbeddingRetriever in the code

`import os import pdfplumber import chainlit as cl import requests from haystack_integrations.document_stores.chroma import ChromaDocumentStore

from haystack.nodes import EmbeddingRetriever from haystack.schema import Document from chromadb import Client as ChromaClient # Import the ChromaDB client

HF_TOKEN = os.getenv("HF_TOKEN")

API_URL = "https://api-inference.huggingface.co/models/mistralai/Mistral-7B-Instruct-v0.1"

chroma_client = ChromaClient() # Initialize ChromaDB client document_store = ChromaDocumentStore(client=chroma_client, embedding_dim=384) retriever = EmbeddingRetriever(document_store=document_store, embedding_model="sentence-transformers/all-MiniLM-L6-v2") ....`

as I get the following error: 2024-08-23 20:30:13 - Auto-enabled tracing for 'OpenTelemetryTracer' Traceback (most recent call last): File "/home/bam/anaconda3/envs/haystack_chroma/bin/chainlit", line 8, in <module> sys.exit(cli()) File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/chainlit/cli/__init__.py", line 201, in chainlit_run run_chainlit(target) File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/chainlit/cli/__init__.py", line 66, in run_chainlit load_module(config.run.module_name) File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/chainlit/config.py", line 419, in load_module spec.loader.exec_module(module) File "<frozen importlib._bootstrap_external>", line 850, in exec_module File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed File "chromatest.py", line 7, in <module> from haystack.nodes import EmbeddingRetriever File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/haystack/nodes/__init__.py", line 1, in <module> from haystack.nodes.base import BaseComponent File "/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/haystack/nodes/base.py", line 11, in <module> from haystack.errors import PipelineSchemaError ImportError: cannot import name 'PipelineSchemaError' from 'haystack.errors' (/home/bam/anaconda3/envs/haystack_chroma/lib/python3.9/site-packages/haystack/errors.py)

removing haystack-ai improves a bit, but then I hit other bugs related to the functions of telemetry and send_message()

lbux commented 2 months ago

Does your application require farm-haystack as opposed to 2.0 haystack (haystack-ai)? The chromaDB documentation makes it seem like it only supports 2.0+.

anakin87 commented 2 months ago

Hello!

farm-haystack (Haystack 1.x) is in maintenance mode and will be discontinued in the future. It does not support Chroma. haystack-ai (Haystack 2.x) is the library we are currently developing. It supports Chroma.

As you noticed, farm-haystack and haystack-ai are not compatible (see https://github.com/deepset-ai/haystack/discussions/6684#discussioncomment-8022284).

If you need guidance on migrating from 1.x to 2.x, please take a look at the Migration guide.

alecrimi commented 2 months ago

Does your application require farm-haystack as opposed to 2.0 haystack (haystack-ai)? The chromaDB documentation makes it seem like it only supports 2.0+.

No, it doesn't. But I need something more than just the integration given by chroma-haystack package. If I understand you well. Instead of installing farm-haystack, I should simply install first chroma-haystack and then haystack 2.0? This has never been clarified in your Chroma-Haystack guide. Is Haystack 2.0 supported by Python 3.,9? I have to double check but I remember pip was installing haystack 1.

anakin87 commented 2 months ago

chroma-haystack automatically installs haystack-ai (2.x). See https://github.com/deepset-ai/haystack-core-integrations/blob/6b07663962967a9308516753236a1642140a59c3/integrations/chroma/pyproject.toml#L25

alecrimi commented 2 months ago

I am seriously confused. In a brand new environment with just chailit and chroma-haystack, I have problems with the embeddingretriever, that's why I ended up installing farm-haystack and removing haystack-ai. I assume in the philosophy haystack 2. The code is different. Can you point me out what should I use instead of the following code inside the embedding model? Do you have an example how to use BM25Retriever? I found only the description in the migration manual.

This is my current (wrong) code: `import os import pdfplumber from haystack_integrations.document_stores.chroma import ChromaDocumentStore

from haystack.nodes import EmbeddingRetriever from haystack.schema import Document from sentence_transformers import SentenceTransformer import chromadb

HF_TOKEN = os.getenv("HF_TOKEN")

chroma_client = chromadb.Client() # Initialize ChromaDB client document_store = ChromaDocumentStore(client=chroma_client, embedding_dim=384)

embedding_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2") retriever = EmbeddingRetriever(document_store=document_store, embedding_model=embedding_model) ... `

alecrimi commented 2 months ago

I found this, it should solve the confusion: https://docs.haystack.deepset.ai/v2.0/docs/chromaqueryretriever

julian-risch commented 2 months ago

@alecrimi in addition to the other links in this issue, I can recommend having a look at the following python notebook: https://colab.research.google.com/github/deepset-ai/haystack-cookbook/blob/main/notebooks/chroma-indexing-and-rag-examples.ipynb It explains how to install the required dependency, how to index data, and how to query the data. An overview of the integration can be found here: https://haystack.deepset.ai/integrations/chroma-documentstore Please don't hesitate to reach out again if you have more questions!