danny-avila / rag_api

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL/pgvector
https://librechat.ai/
143 stars 60 forks source link

Mongo RAG Store not retrieving #49

Closed indradeep closed 2 weeks ago

indradeep commented 3 weeks ago

When I use the Mongo store as the RAG backend, I am able to see that the document is stored in the collection correctly.

Example document in RAG store:

{"_id":{"$oid":"666f3325651a90de34653e48"},"text":"Receipt#45611SOMEONEID: 1000104639Date: 8/2/23Method: MastercardTotal amount$50.00\nThank you for the payment.DescriptionAmount2243$50.00Subtotal$50.00Total$50.00Payments receivedAmount(•••• 1189)Authorization #11512Z$50.00Total$50.00","embedding":[{"$numberDouble":"0.013858422650525407"},{"$numberDouble":"-0.01573014341429008"},...{"$numberDouble":"0.04833140649198582"},{"$numberDouble":"-0.01803774267188253"}],
"file_id":"638a5a7f-a13b-4011-8a6f-ec98eb0a90d6","user_id":"666d7eb7bf556a155f0204fb","digest":"b588924f333b13be0d6fc2b49f0d3d89","source":"./uploads/666d7eb7bf556a155f0204fb/Application Payment Recepit.pdf","page":{"$numberInt":"0"}}

Example document in files under LibreChat Collection

{"_id":{"$oid":"666f3326443c463e3e89abef"},"file_id":"638a5a7f-a13b-4011-8a6f-ec98eb0a90d6","__v":{"$numberInt":"0"},"bytes":{"$numberInt":"49215"},"context":"message_attachment","createdAt":{"$date":{"$numberLong":"1718563621959"}},"embedded":true,"filename":"Application Payment Recepit.pdf","filepath":"vectordb","object":"file","source":"vectordb","type":"application/pdf","updatedAt":{"$date":{"$numberLong":"1718563749265"}},"usage":{"$numberInt":"3"},"user":{"$oid":"666d7eb7bf556a155f0204fb"}}

Retrieval is failing. The vector index is created and up to date:

image

The retrieval of the embedded file that is stored is failing somewhere. There are no logs to follow the failure.

danny-avila commented 3 weeks ago

Yes this is a known issue, which is why there’s a warning when configuring Mongo

@jinzishuai not sure what’s wrong with the implementation, but it fails every time for me

When I get the chance, I will look into it myself as I would personally like to use Atlas for this.

indradeep commented 3 weeks ago

The temporary work around is to define another vector index called default.

Fix is here: https://github.com/danny-avila/rag_api/pull/50