Closed david101-hunter closed 4 days ago
@david101-hunter, I think the the slowness comes not from Chroma itself but from the embedding model. bge-m3
is a relatively large and heavy model and unless you run it on a GPU it can be kind of slow on modest hardware. Can I suggest that you either pre-compute the batch embeddings and just try to add them and measure times then or simply use the default embedding model (all-mini-lm) and measure times then.
Thanks for your insight!
Description
The add_documents method in the Chroma is running significantly slower than expected when processing large batches of documents. This is causing bottlenecks in our document ingestion pipeline.
I have about 200 docs, after using model embebdding bge-m3, I use add_documents to add all docs to vector store like this
Steps to Reproduce
Expected Behavior
Based on our performance requirements, the add_documents method should process 200 documents in under 60 seconds.
Actual Behavior
The add_documents method is taking approximately 10 minutes to process 200 documents.
Versions
Environment
OS: Ubuntu 20.04 LTS Python version: 3.9.11 Library version: langchain_chroma==0.1.2
Relevant log output
No response