I can tell you where I need to do query for unique metadata.
I am ingesting large document texts as embeddings into chromadb. I am creating chunks of tokens of these texts due to token limitation of embedding model. The token size is 512.
I will be generating embeddings of these tokens but these chunks are of same document which is referred as doc_id.
When I do query and if any of the chunk in this document is matched then i do not want any other chunk from same document. This ensures that one document chunk if matched then we do not search other chunks as it will be of same document.
I am planning to store the doc_id as metadata for all chunks.
So I need a distinct query on metadata for doc_id. Currently I am doing manual filtering by keeping doc_id in set and then trying to check whether doc_id exists or not which is ineffiecient.
Describe the proposed solution
I tried something like
collection.get(where={"$distinct": "doc_id"})
but this does not work. Also I have not found any reference of distinct in the chroma documentation.
Alternatives considered
Manually filtering document after checking the doc_id metadata exists or not.
Describe the problem
I can tell you where I need to do query for unique metadata.
Describe the proposed solution
I tried something like collection.get(where={"$distinct": "doc_id"})
but this does not work. Also I have not found any reference of distinct in the chroma documentation.
Alternatives considered
Manually filtering document after checking the doc_id metadata exists or not.
Importance
would make my life easier
Additional Information
No response