Open Jeadie opened 1 year ago
Most/all of these algorithms require custom binary stuff though, so that's why we use Docker. Using pip for particular algorithms would just get us 30% of the way there!
That being said, it probably makes sense to turn ann-benchmarks into a bit more of a library. There's a lot of refactoring worth doing in general. I'm hoping to find a bit more time but random stuff keeps coming in between (starting a startup etc). But definitely want to clean up a bunch of stuff and I'll see if we can try to turn it into a library slowly.
Good point, I didn't know of a use case for pip install ann-benchmarks[algorithms]
off the top of my head. Maybe even pip install ann-benchmarks[annoy]
is excessive. I think the library user could be responsible for both 1) starting the local binary/docker container, then 2) starting their ann-benchmarks pointing (either via docker id of URI) to their algorithm
I'm happy to help out with some of the refactoring. If i understand correctly, #383 is the main issue discussing future refactors?
Overview
erikbern/ann-benchmarks is not only an invaluable open-source comparison of popular ANN method (and, I guess, now ANN databases), but it could also provides a solid framework for performance testing/reporting.
Motivation
I am looking to build performance testing/benchmarks into pgvector (see Issue #16), which currently reports results to ann-benchmarks. It seems superfluous to reinvent the wheel (i.e. dataset process, test execution, metric computation, etc), when pgvector will always want to report to ann-benchmarks too.
Scope
The most important functionality for ann-benchmarks to be useful as an import are in
ann_benchmarks/*.py
and the base algorithm implementation class.This may work better/well with the code restructuring mentioned in #383 .