Open AlokRanjanSwain opened 5 months ago
To clarify, you're running 10 instances of the Docker image? Could you please share the code you're using to test it? How is your load balancing set up to route queries to workers?
Write operations will be sequential, but I believe concurrent read operations should work.
No I am running single docker instance with multiple workers of the uvicorn mentioned in the docker compose command https://github.com/chroma-core/chroma/blob/main/docker-compose.yml
I increased the workers here to 10.
@AlokRanjanSwain, uvicorn treats workers as separate processes. Chroma persistent dir is not meant to be accessed from multiple processes as the underlying sqlite3 and HNSW indices do not support it. This can lead to DB file being corrupted or locked.
Can you elaborate on your workload? Is it only queries where you see this, or do you also have inserts? As @codetheweb mentioned, inserts are blocking operations. They will cause all other queries (including other inserts) to be serialized, thus queued and waiting for a lock to be released before proceeding. If, on the other hand, you only have queries for the latencies you mention, the issue is probably with metadata filters, as those tend to get slower as your DB grows.
It has both inserts and queries, but the inserts are very less compared to queries. At the time of testing, there were no inserts going on. We don't have any metadata filters, only the embedding similarity search. The process however is like, we have 9-10 collections which are queried simultaneously, by around 100 users. On the client side I have multi threaded application which will call all these collections for a single users. If there is a single user, the time is around 10 sec for all the collections, but if the user scales out, the waiting time increased to 2-3 mins.
I have some more query:
It has both inserts and queries, but the inserts are very less compared to queries. At the time of testing, there were no inserts going on. We don't have any metadata filters, only the embedding similarity search. The process however is like, we have 9-10 collections which are queried simultaneously, by around 100 users. On the client side I have multi threaded application which will call all these collections for a single users. If there is a single user, the time is around 10 sec for all the collections, but if the user scales out, the waiting time increased to 2-3 mins.
I have some more query:
@AlokRanjanSwain to clarify, like @tazarov mentioned, having --workers 10
with mixed usage (inserts and queries) can lead to data corruption. You could try having separate chromadb processes for reads (queries) and writes (inserts) but this is not supported and might result in strange consistency bugs.
We're actively working on profiling Chroma against different machine types, and hope to publish our results (along with any recommendations for scaling) soon.
Just a query, If I spawn multiple docker containers using replicas, [i.e they will have same volume where data is stored], Then can we have read concurrency ?
@AlokRanjanSwain, Technically, it is possible. However, you'll need a workaround to ensure you are not corrupting the DB:
You must append ?mode=ro
to the chroma.sqlite3
file, ensuring this is a read-only replica.
I suggest you apply a patch to the container every time it starts. This approach may have caveats, such as data not immediately visible across all replicas (isolation mode might need to be set). Also, this is untested, but other users have reported success with similar setups.
Hi, I'm working on a use case where I'd need to scale reads/distance searches. Could there be some way to allow concurrent reads by default? I think there is not much of a point of the async API if there is only one worker in the background?
What happened?
I am using a single chroma Docker image on a machine, with 10 workers, The machine has 16 cores, 32GB Ram, My client is doing a query on the same collection at same time, like 100 query request at one time. I noticed that even there are 10 workers , the queries on collection is stacked, like 1st query will take 10 sec, the 2nd query will take 20 sec, even if both the queries started at same time.
can anyone tell me about this. If you have the internal architecture on how chroma is doing read and write operation. It will be good as well.
Versions
Chroma 0.4.22, Python 3.10
Relevant log output
No response