chroma-core / chroma

the AI-native open-source embedding database
https://www.trychroma.com/
Apache License 2.0
15.39k stars 1.29k forks source link

Support with long runtime collection query when filtering metadata #777

Closed roy-armis closed 1 month ago

roy-armis commented 1 year ago

Hey all,

First time here, so I apologise if this isn't the place for support.

My data contains ~270K documents, embedded with OpenAI's Embeddings, and 3 metadatas: is_valid, is_purchased and num_words. The first two are ints that act as booleans - meaning 1 is True 0 is False. The latter is int.

When quering the collection only passing the query document, the query runtime is ~2sec, but when adding metadata filtering the query takes much longer:

I would really appreciate any assistance on how to improve performace here, as Ideally the requirement for my app is for this process to be no longer than 3-4 seconds.

Thanks you all for building a great product!

HammadB commented 1 year ago

The best solution here is a WIP #1125

itaismith commented 1 month ago

Hi @roy-armis! This issue was opened some time ago, and Chroma has made some major improvements since, including adding a metadata index that speeds up query time significantly. I just tested locally a similar collection to the one you described and query time was only a second or so. Feel free to reach out or re-open the issue if you're still having performance issues!