Closed blacksmithop closed 2 weeks ago
@blacksmithop, Chroma does not have (yet) any aggregation functions like in regular relational or NoSQL DBs. Therefore, you will have to iterate over the collection to generate the unique values.
Here's a sample code to do that:
import chromadb
client = chromadb.PersistentClient(path="uniq_metadata")
col = client.get_or_create_collection("metadata")
col.upsert(ids=["0","1","2","4"], documents=["doc 1","dc 2","doc 3","doc without metadata"],metadatas=[{"name":"name1","value":"value1"},{"name":"name2","value":"value2"},{"name":"name3","value":"value3"},None])
unique_values={m['name'] for m in col.get(where={"name":{"$ne":""}},include=['metadatas'])['metadatas'] if 'name' in m}
print(unique_values)
You may have to adjust the '{"name":{"$ne":""}}' expression to meet your need (the one above only assumes that you are interested in the metadata field name
that doesn't have values).
Additionally, if your collection is large you may want to paginate results with limit
and offset
in the get()
.
I see, thank you!
Let's say I have a metadata field named "Field", I wish to fetch a document each for every value of Field. If Fields were "A", "B", and "C" I would have 3 documents.
Effectively I wish to get a list of unique values for a metadata field without manually iterating over them. Related SO Thread
I tried the following query but it seems I cannot pass a
distinct
towhere_document