danny-avila / rag_api

ID-based RAG FastAPI: Integration with Langchain and PostgreSQL/pgvector
https://librechat.ai/
143 stars 60 forks source link

🚀 feat: Add Atlas MongoDB as an option for Vector Store #21

Closed jinzishuai closed 1 month ago

jinzishuai commented 3 months ago

related to https://github.com/danny-avila/LibreChat/discussions/2304

image

So far, only two APIs work:

They are sufficient to make most important use case of asking a questions based on a file work (see screenshot above)

jinzishuai commented 3 months ago

Note on Altas Vector Search Index

I created "default" index manually with the following JSON

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "file_id",
      "type": "filter"
    }
  ]
}
danny-avila commented 3 months ago

Note on Altas Vector Search Index

I created "default" index manually with the following JSON

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "file_id",
      "type": "filter"
    }
  ]
}

does this share collection with the File collection? because maybe we can do that. this way the files and vectors can be deleted at once

jinzishuai commented 3 months ago

@danny-avila thanks for the comments.

jinzishuai commented 2 months ago

@danny-avila I think this PR is ready for a review. I've got the most basic APIs you listed working. I haven't implemented the async methods as you have done for pgvector but I think that can wait for a separate PR once this is merged.

I do have one question that I think it is best done with your decision. How do you want a user to switch between pgvector and atlas-mongo?

Right now, my .env looks like this

# # pgvector
# DB_HOST=xxx
# DB_PORT=5432
# POSTGRES_DB=librechat-pgvector
# POSTGRES_USER=postgres. 
# POSTGRES_PASSWORD= 

# atlas-mongo vector
ATLAS_MONGO_DB_URI=mongodb+srv:/
COLLECTION_NAME=testcollection

Do you want me to introduce a new variable like VECTOR_DB_TYPE=pgvector/atlas-mongo and we can default to pgvector. We'll simply ignore the other type of variables in this case?

Thanks a lot.

danny-avila commented 2 months ago

Thanks, it's on my list to review. Does this has to be done manually for atlas to work here?

Note on Altas Vector Search Index

I created "default" index manually with the following JSON

{
  "fields": [
    {
      "numDimensions": 1536,
      "path": "embedding",
      "similarity": "cosine",
      "type": "vector"
    },
    {
      "path": "file_id",
      "type": "filter"
    }
  ]
}
jinzishuai commented 2 months ago

Thanks, it's on my list to review. Does this has to be done manually for atlas to work here?

Yes, there is likely an API or CLI tool to set it up as well but the creation of vector index needs to be done.

danny-avila commented 2 months ago

Thanks, it's on my list to review. Does this has to be done manually for atlas to work here?

Yes, there is likely an API or CLI tool to set it up as well but the creation of vector index needs to be done.

can you do a quick write up in the read me on how to do that exactly? I'm sure I could research this but it would be nice