deepset-ai / haystack-core-integrations

Additional packages (components, document stores and the likes) to extend the capabilities of Haystack version 2.0 and onwards
https://haystack.deepset.ai
Apache License 2.0
100 stars 96 forks source link

MongoDBAtlasDocumentStore doesn't recognize/use my connection string when creating MongoClient #953

Open scooter4j opened 1 month ago

scooter4j commented 1 month ago

Describe the bug When I try to create a MongoDBAtlasDocumentStore, specifying my personal mongo connection string via environment variable os.environ["MONGO_CONNECTION_STRING"] = "mongodb+srv://scooter4j:HIDDEN@cluster0.3oecvqa.mongodb.net/?retryWrites=true&w=majority&appName=cluster0", I'm unable to establish a connection to my mongo db instance. Instead, I get the following error:

pymongo.errors.ServerSelectionTimeoutError: ac-qoeetyn-shard-00-01.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms),ac-qoeetyn-shard-00-02.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms),ac-qoeetyn-shard-00-00.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms), Timeout: 30s, Topology Description: <TopologyDescription id: 66a6d4661a19d4beecec5a30, topology_type: ReplicaSetNoPrimary, servers: [<ServerDescription ('ac-qoeetyn-shard-00-00.3oecvqa.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-qoeetyn-shard-00-00.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>, <ServerDescription ('ac-qoeetyn-shard-00-01.3oecvqa.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-qoeetyn-shard-00-01.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>, <ServerDescription ('ac-qoeetyn-shard-00-02.3oecvqa.mongodb.net', 27017) server_type: Unknown, rtt: None, error=AutoReconnect('ac-qoeetyn-shard-00-02.3oecvqa.mongodb.net:27017: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1006) (configured timeouts: socketTimeoutMS: 20000.0ms, connectTimeoutMS: 20000.0ms)')>]> python-BaseException

The code sees my personal connection string and sets the resolved_connection_string to my conneciton string, but this string isn't used when creating the MongoClient connection. See image below:

Screen Shot 2024-08-01 at 12 41 18 PM

To Reproduce My code: `import os from InstructorEmbedding import INSTRUCTOR from haystack_integrations.document_stores.mongodb_atlas import MongoDBAtlasDocumentStore from haystack_integrations.components.retrievers.mongodb_atlas import MongoDBAtlasEmbeddingRetriever

os.environ["MONGO_CONNECTION_STRING"] = "mongodb+srv://scooter4j:HIDDEN@cluster0.3oecvqa.mongodb.net/?retryWrites=true&w=majority&appName=cluster0"

model = INSTRUCTOR('hkunlp/instructor-base')

instruction = "Represent the physical fitness paragraph for retrieval:" query = "What days did my right leg have odd sensations?"

query_embedding = model.encode([[instruction,query]])

Initialize the document store

document_store = MongoDBAtlasDocumentStore( database_name="sq_rag_sandbox", collection_name="training_notes", vector_search_index="vector_index", )

print(f"Document store contains {document_store.count_documents()} documents")

retriever = MongoDBAtlasEmbeddingRetriever(document_store=document_store)

example run query

blah = retriever.run(query_embedding=query_embedding[0].tolist())

print("placeholder....")`

Note that I use Instructor for my embeddings, but that's immaterial, really, as the problem comes when trying to connect to the MongoDB Atlas cluster independently of getting the embeddings.

Describe your environment (please complete the following information):

Amnah199 commented 4 weeks ago

I've investigated the issue on my end and was able to set up the connection without any errors. The problem you're encountering might be due to a missing SSL certificate required for the connection. Installing the certifi package could resolve this. You can refer to this post for more details: PyMongo SSL Certificate Verify Failed.

Additionally, the official documentation offers resources for troubleshooting TLS errors: PyMongo TLS Troubleshooting.

I recommend trying these solutions and letting us know if the issue persists.

scooter4j commented 4 weeks ago

I've been down that path.... for my own, personal code I use certifi when setting up the MongoClient, as shown (and I'm able to connect to Mongo Atlas without problem using this code). However, one can not specify the tlsCAFile parameter to pass to the MongoClient when creating a MongoDBAtlasDocumentStore object....

def connect_to_db(mongodb_uri, database, use_tls=True):
    if use_tls:
        mongodb_client = MongoClient(mongodb_uri, tlsCAFile=certifi.where())
    else:
        mongodb_client = MongoClient(mongodb_uri)

    database = mongodb_client[database]
    print("Connected to the MongoDB database!")
    return mongodb_client, database