deepset-ai / haystack

AI orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
https://haystack.deepset.ai
Apache License 2.0
17.74k stars 1.92k forks source link

Mongo Dense Retriever - Unrecognized pipeline stage name: '$vectorSearch' #7031

Closed tillwf closed 5 months ago

tillwf commented 9 months ago

The Mongo engine does not recognize the vector index:

from haystack.document_stores import MongoDBAtlasDocumentStore
from haystack.pipelines import Pipeline
from haystack.nodes import EmbeddingRetriever
import os

document_store = MongoDBAtlasDocumentStore(
    mongo_connection_string=f"mongodb+srv://{os.getenv('MONGO_USER')}:{os.getenv('MONGO_PASS')}@{os.getenv('MONGO_URL')}",
    database_name=os.getenv("MONGO_DB"),
    collection_name="articles_embeddings",
    vector_search_index="embedding_index",
    embedding_dim=384
)

dense_retriever = EmbeddingRetriever(
    document_store=document_store,
    embedding_model="sentence-transformers/all-MiniLM-L6-v2",
    use_gpu=True,
    scale_score=False,
)
pipeline = Pipeline()
pipeline.add_node(component=dense_retriever, name="DenseRetriever", inputs=["Query"])

result = pipeline.run(
    query="test",
    params={
        "DenseRetriever": {
            "top_k": 10,
        }
    }
)

and I get this error:

Exception: Exception while running node 'DenseRetriever': Unrecognized pipeline stage name: '$vectorSearch', full error: {'ok': 0.0, 'errmsg': "Unrecognized pipeline stage name: '$vectorSearch'", 'code': 40324, 'codeName': 'Location40324', '$clusterTime': {'clusterTime': Timestamp(1707143848, 40), 'signature': {'hash': b'\xe1{hg\x0e\xc8\x91\xc6\xec\xf6\xbe\x91\xa5,\xda(@\x8eo\x1b', 'keyId': 7294832502911270929}}, 'operationTime': Timestamp(1707143848, 40)}

Here is a screen of my index I made: image

and the code I used to create it:

{
  "fields":[
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 384,
      "similarity": "cosine"
    }
  ]
}

Did I do something wrong?

Originally posted by @tillwf in https://github.com/deepset-ai/haystack/issues/6643#issuecomment-1927172094

masci commented 5 months ago

Closing as won't fix, focusing support 2.x through the MongoDB integration.