chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.58k stars 1.3k forks source link

[Bug]: Single worker is querying the DB #2325

Open AlokRanjanSwain opened 5 months ago

AlokRanjanSwain commented 5 months ago

What happened?

I am using a single chroma Docker image on a machine, with 10 workers, The machine has 16 cores, 32GB Ram, My client is doing a query on the same collection at same time, like 100 query request at one time. I noticed that even there are 10 workers , the queries on collection is stacked, like 1st query will take 10 sec, the 2nd query will take 20 sec, even if both the queries started at same time.

can anyone tell me about this. If you have the internal architecture on how chroma is doing read and write operation. It will be good as well.

Versions

Chroma 0.4.22, Python 3.10

Relevant log output

No response

codetheweb commented 5 months ago

To clarify, you're running 10 instances of the Docker image? Could you please share the code you're using to test it? How is your load balancing set up to route queries to workers?

Write operations will be sequential, but I believe concurrent read operations should work.

AlokRanjanSwain commented 5 months ago

No I am running single docker instance with multiple workers of the uvicorn mentioned in the docker compose command https://github.com/chroma-core/chroma/blob/main/docker-compose.yml

image

I increased the workers here to 10.

tazarov commented 5 months ago

@AlokRanjanSwain, uvicorn treats workers as separate processes. Chroma persistent dir is not meant to be accessed from multiple processes as the underlying sqlite3 and HNSW indices do not support it. This can lead to DB file being corrupted or locked.

Can you elaborate on your workload? Is it only queries where you see this, or do you also have inserts? As @codetheweb mentioned, inserts are blocking operations. They will cause all other queries (including other inserts) to be serialized, thus queued and waiting for a lock to be released before proceeding. If, on the other hand, you only have queries for the latencies you mention, the issue is probably with metadata filters, as those tend to get slower as your DB grows.

AlokRanjanSwain commented 5 months ago

It has both inserts and queries, but the inserts are very less compared to queries. At the time of testing, there were no inserts going on. We don't have any metadata filters, only the embedding similarity search. The process however is like, we have 9-10 collections which are queried simultaneously, by around 100 users. On the client side I have multi threaded application which will call all these collections for a single users. If there is a single user, the time is around 10 sec for all the collections, but if the user scales out, the waiting time increased to 2-3 mins.

I have some more query:

  1. Will it work good if I spawn multiple instances/replicas of the chromadb on the same docker persistent volume data ??
  2. Or should I spawn multiple instances each with specified collections to query over. [this will require some query routing I guess]
AlokRanjanSwain commented 5 months ago

It has both inserts and queries, but the inserts are very less compared to queries. At the time of testing, there were no inserts going on. We don't have any metadata filters, only the embedding similarity search. The process however is like, we have 9-10 collections which are queried simultaneously, by around 100 users. On the client side I have multi threaded application which will call all these collections for a single users. If there is a single user, the time is around 10 sec for all the collections, but if the user scales out, the waiting time increased to 2-3 mins.

I have some more query:

  1. Will it work good if I spawn multiple instances/replicas of the chromadb on the same docker persistent volume data ??
  2. Or should I spawn multiple instances each with specified collections to query over. [this will require some query routing I guess]
codetheweb commented 5 months ago

@AlokRanjanSwain to clarify, like @tazarov mentioned, having --workers 10 with mixed usage (inserts and queries) can lead to data corruption. You could try having separate chromadb processes for reads (queries) and writes (inserts) but this is not supported and might result in strange consistency bugs.

We're actively working on profiling Chroma against different machine types, and hope to publish our results (along with any recommendations for scaling) soon.

AlokRanjanSwain commented 5 months ago

Just a query, If I spawn multiple docker containers using replicas, [i.e they will have same volume where data is stored], Then can we have read concurrency ?

tazarov commented 5 months ago

@AlokRanjanSwain, Technically, it is possible. However, you'll need a workaround to ensure you are not corrupting the DB:

https://github.com/chroma-core/chroma/blob/ec2e717a2d20f29db73eaae7d85e5f99ef7a810e/chromadb/db/impl/sqlite.py#L86

You must append ?mode=ro to the chroma.sqlite3 file, ensuring this is a read-only replica.

I suggest you apply a patch to the container every time it starts. This approach may have caveats, such as data not immediately visible across all replicas (isolation mode might need to be set). Also, this is untested, but other users have reported success with similar setups.

dabu1111 commented 3 months ago

Hi, I'm working on a use case where I'd need to scale reads/distance searches. Could there be some way to allow concurrent reads by default? I think there is not much of a point of the async API if there is only one worker in the background?