Open gtsoukas opened 2 years ago
I think that would be interesting! I think the downside is
I think that would be interesting! I think the downside is
- Would make it more complex
Fully agree, probably the key reason not to do it.
- Not sure if there's any obvious public datasets for this?
Datasets from the existing benchmark could be reused if an additional artificial, categorial random variable is introduced, allowing to filter to fractions of the original dataset between 0-100%. The approach is described here: https://towardsdatascience.com/effects-of-filtered-hnsw-searches-on-recall-and-latency-434becf8041c
if an additional artificial, categorial random variable is introduced, allowing to filter to fractions of the original dataset between 0-100%
I think that makes sense, but it would be nice if there's some more natural way to do it. Eg for the MNIST dataset, filtering by digit 0-9 could be nice.
Would it be in the spirit of this benchmarks to add a second benchmark category for ANN in conjunction with categorial filters?
Most real-world applications of ANN will required category filtering e.g. when searching for cloths in an e-commerce scenario via ANN one might filter by gender (categorial) or availability (categorial).
There are several software products which allow combining ANN and category filters e.g. Apache Solr, Elasticsearch, Vertex AI Matching Engine, weaviate, qdrant. However, they mainly differ to this benchmarks in that they are managed services or just services but not embeddable libraries.
In addition to recall vs. queries per second there should be a view which filters to a fraction of the date vs. recall vs. queries per second. For the proprietary managed services, also a cost dimension might be useful.
I have found the following blog articles covering the topic:
Given that this would be very useful for practical implementations but also the fact that it significantly complicates the benchmarks I would be interested in your opinion and/or how I could help with it. Also I would be great to know if someone has already done such benchmarks.