Open erikbern opened 1 year ago
I honestly feel that disabling these implementations is a bad idea. I would rather disable the 5+ implementations of the same idea (HNSW) instead of disabling unique approaches that might be performing a bit worse.
Yeah I'm open to keeping some of those – any particular one in particular that you think we should not disable?
Mostly just trying to reduce the clutter in the graphs and the total runtime of the benchmarks. But like you said, maybe a better approach would be to reduce some of the duplication. Is there a particular set of hnsw algos you think we could disable?
I think BallTree(nmslib) (a standard tree data structure) and puffinn (LSH) are worth keeping. As you already mentioned, elastiknn is going to be a default choice for many users, so it would be great if it were to show up in the benchmark.
With respect to decluttering, I would say (but I'm happy if the authors of the implementation were to disagree)
However, many users will use FAISS, so it is probably worth having their hnsw implementation in the benchmark as well.
I plan to run the benchmark locally on a server and record the total time spent for each implementation to get a bit more insight into where we actually spend most of the time.
Excellent points! I think my only counterpoint would be to maybe keep the hnsw(vespa) since I think that's the only vespa one?
Also agree about making sure no algo blows up the total benchmark time. I could potentially add some output for that. Open to subsampling any one that takes too much total time
Hi, @erikbern , it's good news to know you are planning to re-run all benchmarks. Just yesterday, we have updated our algorithm qsgngt to a new version, which can fix the problem, “qsngt: something with the wheel, i'll take a look” . You may need to rebuild the docker image of qsgngt. Please let me know if there is any question with rerunning qsgngt, thanks a lot.
sounds great. i'll check in the next few days!
sounds great. i'll check in the next few days!
Hi,@erikbern ,I wonder if there is any problem on rerunning qsgngt algorithm, we have runned the annbenchmarks and found it has a very good performance . We hope this algorithm can keep up with your re-run plan. Thank you!
Excellent points! I think my only counterpoint would be to maybe keep the hnsw(vespa) since I think that's the only vespa one?
Thanks, I would be sad if Vespa was deleted from the runs.
hey y'all,
I have been thinking about this recently myself. I am concerned about ANY algorithm that uses network serialization in its run. This is Elasticsearch, Opensearch, Weaviate, QDrant. These aren't really "algorithms" in the same sense as scann
or faiss
.
Vespa addressed this issue by adding a special "benchmark" python client that directly wraps the native HNSW implementation. But this just isn't feasible for many typically HTTP driven solutions.
What do we think of just removing all "remote network call" algorithms? Or labeling them as such and comparing them that way? Once you start getting into very efficient algorithms, the network and serialization overhead becomes the dominating factor.
@erikbern ^ I will happily create a different issue for this discussion. Just seemed related to an overall update to the algos :)
I think it would make sense to break out the client-server based ones into a separate chart – makes sense to keep them separate. That being said I don't think the network/serialization overhead should be substantial, at least not for high-recall regimes
That being said I don't think the network/serialization overhead should be substantial, at least not for high-recall regimes
For very high recalls, the constant overhead impact is lessened. However, serializing over the network (regardless of RESTful, protobufs, etc.) will always have a significant overhead when compared to native memory access. The benchmark then becomes "How can reduce the network serialization overhead the best" vs. "who has the better vector search algorithm".
Benchmarking things fairly, correctly, and realistically is a difficult thing. Thank you for all your work on this project @erikbern! It really does help the vector search community.
@benwtrent Good point. We implemented the "prepared query" mode to make it possible to hide this cost, but looking at the current implementations (e.g., https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/algorithms/luceneknn/module.py#L101-L123), it seems it's not exactly used in this way. The idea was that at query time you only send a small signal to carry out the actual query, but all serialization/transfer is done in the prepare_query and getreprepared....
I think it would be great to tag each implementation with information if this is algorithm or library and some info on the basic algorithm that is implemented (like our huge collection of HNSW implementations...). Can we just decorate the constructor(s) in each module?
@maumueller
We implemented the "prepared query" mode ... The idea was that at query time you only send a small signal to carry out the actual query, but all serialization/transfer is done in the prepare_query and getreprepared....
This assumes that the service implements a "store query & execute later" API. I wouldn't expect many vector search systems to implement this as its not a common case (users of these systems just query with a vector and expect a result). It would be strange to me to make a special API in a production system just for benchmarking.
I think these things could be decorations on the constructor. Or, I was thinking in the algo definition file having a new field tagging the algorithm kind or something. This way its easily filterable in runs and result files can be annotated with those tags.
Right, the config is probably a better place at this point in time.
Thanks for your brilliant work!
KGraph has received (minor) updates in the past year, and I’ve been using it in my toy project. I’m considering switching to other systems for better performance.
Although KGraph is mentioned in the README file, it appears to be missing from the benchmark results, making it difficult to directly compare its performance with other alternatives. Could it have been removed or perhaps listed under a different name?
I think it would be valuable to consider including KGraph in the ANN benchmarks to evaluate its performance alongside other algorithms.
Hi @yuang-chen! KGraph is currently disabled (see https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/algorithms/kgraph/config.yml). I believe this is because it was always outperformed by other graph-based approaches, such as the hnsw variants, diskann, and ngt.
You can run it yourself to see what the difference to the state-of-the-art graph-based approaches is.
I'm planning to re-run all benchmarks shortly. Going through the algos with some todos (checking boxes as I fix it)
Currently failing in CI:
Some algos we should probably disable because their performance is bad anyway: