Closed alexklibisz closed 3 months ago
Interesting, this seems to have been broken for a long time (and meant that we excluded many of the implementations of the nmslib
library.) Thanks for the fix, @alexklibisz!
I sampled a few implementations and only pynndescent has a somewhat strange structure for euclidean/angular/any. @lmcinnes Could you check if the any
entry of https://github.com/erikbern/ann-benchmarks/blob/c4155055ee45a0dc46ee5bf1a90f6fbde927c50d/ann_benchmarks/algorithms/pynndescent/config.yml is useful?
I think the any
case is a "fallback" option in case the other matches didn't work out, so it works as a "I don't know what else to do; try this" approach, but if that is not how any
is being used then perhaps we just remove the any
option for pynndescent? How is any
intended to work?
Thanks for the quick reply! As it seems, any
took precedence over all other configurations, so it might be that all your pynndescent runs were using these parameter settings.
With the fix from @alexklibisz, it will now merge the any/euclidean and any/angular configurations depending on the dataset.
I think removing the any option for pynndescent is probably the best option then. Those aren't really optimal parameters for anything, just a reasonable in-between choice to cover possibilities. Best to rely on the specific values for the individual metric types.
Maybe I'm missing something, but it seems like the logic in
_get_algorithm_definitions
leads to incorrectly skipping algorithm definitions, which I've attempted to fix here.For example, elastiknn has definitions for the "point types"
any
andeuclidean
: https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/algorithms/elastiknn/config.ymlBut, if I run
python run.py --algorithm elastiknn-l2lsh --dataset random-xs-20-euclidean --run-disabled --timeout 30 --local --force --runs 1
, I get the "Nothing to run" exception. That doesn't make sense IMO. Elastiknn has definitions for theeuclidean
point type, so there is not "nothing to run".It seems that the non-any point type is skipped because of the logic in
_get_algorithm_definitions
. If an algorithm has definitions forany
, they take precedence over the definitions for a specific point type (euclidean). We can fix this by changing the logic so that it accumulates all matching point types, rather than just taking theany
type and skipping the rest. In other words, we change theelif
to a secondif
.