Open Tylersuard opened 1 year ago
@Tylersuard, there are couple of pieces of data that Chroma stores:
As of today, Chroma loads all accessed collections into memory and never unloads them. We are considering ways to allow users to unload collections to save memory and thus decouple DB size from memory size.
@tazarov Thank you for the answer! I am wondering if the situation has been changed and whether there are any updates.
@puyuanOT, I've create a small PR that implemented manual unloading, but it was actually going to cause more problems for devs than it solves if we allow the manual unloading of collections from the API. We're considering the best approach for this that will not invalidate some of the memory assumptions of Chroma for both single-node and distributed.
Your input is valuable, so please also consider your requirements for this feature and share them here.
Any updates on unloading unecessary collection from memory. I have a db collection of 4gb while sqlite file is 14gb @tazarov
any update @tazarov ?
I am also troubled by this issue. Do you have any updates on unloading unecessary collection from memory?
@baseplate77, @yuanpeizhou, we implemented an LRU strategy that can unload collections that are not frequently used (assuming you have more than one collection). The functionality has been documented here https://cookbook.chromadb.dev/strategies/memory-management/#lru-cache-strategy
@tazarov I could not get the LRU strategy to work despite following the above instructions. I'm running the latest Chroma 0.5.7
inside the official Docker image chromadb/chroma
. I'm setting the following env variables on the Docker container:
CHROMA_SEGMENT_CACHE_POLICY="LRU"
CHROMA_MEMORY_LIMIT_BYTES="2500000000" # ~2.5GB
When I keep inserting vectors into a Chroma database (empty in the beginning), the memory usage happily flies past the set limit until the whole thing crashes due to running out of memory:
What happened?
I just wanted to be clear. If I have only 16gb of ram on my system, that means that my ChromaDB cannot be larger than 16gb? Are there any ways around that? I am working with a dataset that will likely be 1TB when embedded.
Versions
Chroma 0.4
Relevant log output
No response