chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.27k stars 1.28k forks source link

Size of database cannot grow larger than available RAM? #1323

Open Tylersuard opened 1 year ago

Tylersuard commented 1 year ago

What happened?

I just wanted to be clear. If I have only 16gb of ram on my system, that means that my ChromaDB cannot be larger than 16gb? Are there any ways around that? I am working with a dataset that will likely be 1TB when embedded.

Versions

Chroma 0.4

Relevant log output

No response

tazarov commented 1 year ago

@Tylersuard, there are couple of pieces of data that Chroma stores:

As of today, Chroma loads all accessed collections into memory and never unloads them. We are considering ways to allow users to unload collections to save memory and thus decouple DB size from memory size.

puyuanOT commented 10 months ago

@tazarov Thank you for the answer! I am wondering if the situation has been changed and whether there are any updates.

tazarov commented 10 months ago

@puyuanOT, I've create a small PR that implemented manual unloading, but it was actually going to cause more problems for devs than it solves if we allow the manual unloading of collections from the API. We're considering the best approach for this that will not invalidate some of the memory assumptions of Chroma for both single-node and distributed.

Your input is valuable, so please also consider your requirements for this feature and share them here.

ML-Abdula commented 7 months ago

Any updates on unloading unecessary collection from memory. I have a db collection of 4gb while sqlite file is 14gb @tazarov

baseplate77 commented 5 months ago

any update @tazarov ?

yuanpeizhou commented 4 months ago

I am also troubled by this issue. Do you have any updates on unloading unecessary collection from memory?

tazarov commented 3 months ago

@baseplate77, @yuanpeizhou, we implemented an LRU strategy that can unload collections that are not frequently used (assuming you have more than one collection). The functionality has been documented here https://cookbook.chromadb.dev/strategies/memory-management/#lru-cache-strategy

hpihkala commented 1 month ago

@tazarov I could not get the LRU strategy to work despite following the above instructions. I'm running the latest Chroma 0.5.7 inside the official Docker image chromadb/chroma. I'm setting the following env variables on the Docker container:

CHROMA_SEGMENT_CACHE_POLICY="LRU"
CHROMA_MEMORY_LIMIT_BYTES="2500000000"  # ~2.5GB

When I keep inserting vectors into a Chroma database (empty in the beginning), the memory usage happily flies past the set limit until the whole thing crashes due to running out of memory:

Screenshot 2024-09-20 at 18 35 34