chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
14.42k stars 1.2k forks source link

[Feature Request]: Collection Eviction Strategy or TTL Support #2042

Open MarkintoshZ opened 4 months ago

MarkintoshZ commented 4 months ago

Describe the problem

In some situations, developers don't know how long the embedding collections need to persist in the database and would use ChromaDB as a cache. Manually deleting collections when the memory fills up becomes cumbersome and error-prone.

Describe the proposed solution

I would like to see the implementation of a time-to-live feature or a data eviction strategy within ChromaDB. This would automatically remove collections based on specified criteria, such as their age or a predefined expiration time.

Alternatives considered

Other solutions could involve periodically removing old data or implementing manual data pruning logic. However, these alternatives are less efficient and incur more overhead.

Importance

would make my life easier

Additional Information

No response

tazarov commented 4 months ago

@MarkintoshZ, we already have this kind of feature. Check these locations in the codebase:

https://github.com/chroma-core/chroma/blob/e5ec1b39171f62db4efe549207e488bbbdb9a12c/chromadb/config.py#L143-L144 (configuration)

https://github.com/chroma-core/chroma/blob/e5ec1b39171f62db4efe549207e488bbbdb9a12c/chromadb/segment/impl/manager/cache/cache.py#L47 (LRU cache implementation).

Let me know if you need any more help in running this.