chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.09k stars 1.27k forks source link

[Feature Request]: Metadata Filters: having common tags or not #2487

Open anyangml opened 3 months ago

anyangml commented 3 months ago

Describe the problem

Trying to build a database to perform similarity search on inorganic materials. It would be very useful it there is a filter that allows the users to filter based on the elemental composition. For example, if vector A represents a material containing ["Si", "O", "Al"], vector B represents a material containing ["Mg", "Ni", "O"]. The filter should have a contains/excludes logic.

This could also be useful for general purpose, e.g. filtering text based on topics.

Describe the proposed solution

To add contains/excludes logic to meta data filters.

Alternatives considered

No response

Importance

nice to have

Additional Information

No response

HammadB commented 3 months ago

Can you clarify the shape of your metadata here? We do support contains via the where_document clause - https://docs.trychroma.com/guides#filtering-by-document-contents - does that work for your needs?

anyangml commented 3 months ago

Can you clarify the shape of your metadata here? We do support contains via the where_document clause - https://docs.trychroma.com/guides#filtering-by-document-contents - does that work for your needs?

I might be wrong, but my understanding is that the document filter where_document only works for text. In my case, however, the vector comes from an encoder that converts a 3D structure into a 1D vector; there is no actual document to search within. Therefore, I don't think I can treat the 3D structure as "the document" in the case of a RAG task. A field in the metadata might look like this:


metadata = {
    "elements" : ["Mg", "Al", "O"], # List[str]
    "natoms": 16, # int
}