erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.97k stars 748 forks source link

Algo updates #335

Open erikbern opened 1 year ago

erikbern commented 1 year ago

I'm planning to re-run all benchmarks shortly. Going through the algos with some todos (checking boxes as I fix it)

Currently failing in CI:

Some algos we should probably disable because their performance is bad anyway:

maumueller commented 1 year ago

I honestly feel that disabling these implementations is a bad idea. I would rather disable the 5+ implementations of the same idea (HNSW) instead of disabling unique approaches that might be performing a bit worse.

erikbern commented 1 year ago

Yeah I'm open to keeping some of those – any particular one in particular that you think we should not disable?

Mostly just trying to reduce the clutter in the graphs and the total runtime of the benchmarks. But like you said, maybe a better approach would be to reduce some of the duplication. Is there a particular set of hnsw algos you think we could disable?

maumueller commented 1 year ago

I think BallTree(nmslib) (a standard tree data structure) and puffinn (LSH) are worth keeping. As you already mentioned, elastiknn is going to be a default choice for many users, so it would be great if it were to show up in the benchmark.

With respect to decluttering, I would say (but I'm happy if the authors of the implementation were to disagree)

However, many users will use FAISS, so it is probably worth having their hnsw implementation in the benchmark as well.

I plan to run the benchmark locally on a server and record the total time spent for each implementation to get a bit more insight into where we actually spend most of the time.

erikbern commented 1 year ago

Excellent points! I think my only counterpoint would be to maybe keep the hnsw(vespa) since I think that's the only vespa one?

Also agree about making sure no algo blows up the total benchmark time. I could potentially add some output for that. Open to subsampling any one that takes too much total time

WPJiang commented 1 year ago

Hi, @erikbern , it's good news to know you are planning to re-run all benchmarks. Just yesterday, we have updated our algorithm qsgngt to a new version, which can fix the problem, “qsngt: something with the wheel, i'll take a look” . You may need to rebuild the docker image of qsgngt. Please let me know if there is any question with rerunning qsgngt, thanks a lot.

erikbern commented 1 year ago

sounds great. i'll check in the next few days!

WPJiang commented 1 year ago

sounds great. i'll check in the next few days!

Hi,@erikbern ,I wonder if there is any problem on rerunning qsgngt algorithm, we have runned the annbenchmarks and found it has a very good performance . We hope this algorithm can keep up with your re-run plan. Thank you!

jobergum commented 1 year ago

Excellent points! I think my only counterpoint would be to maybe keep the hnsw(vespa) since I think that's the only vespa one?

Thanks, I would be sad if Vespa was deleted from the runs.

benwtrent commented 1 year ago

hey y'all,

I have been thinking about this recently myself. I am concerned about ANY algorithm that uses network serialization in its run. This is Elasticsearch, Opensearch, Weaviate, QDrant. These aren't really "algorithms" in the same sense as scann or faiss.

Vespa addressed this issue by adding a special "benchmark" python client that directly wraps the native HNSW implementation. But this just isn't feasible for many typically HTTP driven solutions.

What do we think of just removing all "remote network call" algorithms? Or labeling them as such and comparing them that way? Once you start getting into very efficient algorithms, the network and serialization overhead becomes the dominating factor.

benwtrent commented 1 year ago

@erikbern ^ I will happily create a different issue for this discussion. Just seemed related to an overall update to the algos :)

erikbern commented 1 year ago

I think it would make sense to break out the client-server based ones into a separate chart – makes sense to keep them separate. That being said I don't think the network/serialization overhead should be substantial, at least not for high-recall regimes

benwtrent commented 1 year ago

That being said I don't think the network/serialization overhead should be substantial, at least not for high-recall regimes

For very high recalls, the constant overhead impact is lessened. However, serializing over the network (regardless of RESTful, protobufs, etc.) will always have a significant overhead when compared to native memory access. The benchmark then becomes "How can reduce the network serialization overhead the best" vs. "who has the better vector search algorithm".

Benchmarking things fairly, correctly, and realistically is a difficult thing. Thank you for all your work on this project @erikbern! It really does help the vector search community.

maumueller commented 1 year ago

@benwtrent Good point. We implemented the "prepared query" mode to make it possible to hide this cost, but looking at the current implementations (e.g., https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/algorithms/luceneknn/module.py#L101-L123), it seems it's not exactly used in this way. The idea was that at query time you only send a small signal to carry out the actual query, but all serialization/transfer is done in the prepare_query and getreprepared....

I think it would be great to tag each implementation with information if this is algorithm or library and some info on the basic algorithm that is implemented (like our huge collection of HNSW implementations...). Can we just decorate the constructor(s) in each module?

benwtrent commented 1 year ago

@maumueller

We implemented the "prepared query" mode ... The idea was that at query time you only send a small signal to carry out the actual query, but all serialization/transfer is done in the prepare_query and getreprepared....

This assumes that the service implements a "store query & execute later" API. I wouldn't expect many vector search systems to implement this as its not a common case (users of these systems just query with a vector and expect a result). It would be strange to me to make a special API in a production system just for benchmarking.

I think these things could be decorations on the constructor. Or, I was thinking in the algo definition file having a new field tagging the algorithm kind or something. This way its easily filterable in runs and result files can be annotated with those tags.

maumueller commented 1 year ago

Right, the config is probably a better place at this point in time.

yuang-chen commented 1 month ago

Thanks for your brilliant work!

KGraph has received (minor) updates in the past year, and I’ve been using it in my toy project. I’m considering switching to other systems for better performance.

Although KGraph is mentioned in the README file, it appears to be missing from the benchmark results, making it difficult to directly compare its performance with other alternatives. Could it have been removed or perhaps listed under a different name?

I think it would be valuable to consider including KGraph in the ANN benchmarks to evaluate its performance alongside other algorithms.

maumueller commented 1 month ago

Hi @yuang-chen! KGraph is currently disabled (see https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/algorithms/kgraph/config.yml). I believe this is because it was always outperformed by other graph-based approaches, such as the hnsw variants, diskann, and ngt.

You can run it yourself to see what the difference to the state-of-the-art graph-based approaches is.