Cecca / role-of-dimensionality

MIT License
3 stars 0 forks source link

Benchmarking nearest neighbors

This is the version of http://github.com/erikbern/ann-benchmarks/ accompanying our paper The Role of Local Dimensionality Measures in Benchmarking Nearest Neighbor Search. See the main repository for the benchmarking tool intended for use for a general audience.

Install

The only prerequisite is Python (tested with 3.6) and Docker.

  1. Clone the repo.
  2. Run pip install -r requirements.txt.
  3. Run python install.py to build all the libraries inside Docker containers (this can take a while, like 10-30 minutes).

Running

  1. Run python run.py (this can take an extremely long time, potentially days)
  2. Run python plot.py or python create_website.py to plot results.

You can customize the algorithms and datasets if you want to:

Result processing

First, you have to export the results:

Then you have to setup your R installation. Open an R shell and type packrat::restore(). At this point you can run the analysis and plotting pipeline by just typing make

Running All Experiments

To re-run the complete set of experiments, use make install & make run.

Changes

See https://cecca.github.io/role-of-dimensionality/ for the evaluation including plots, preprocessed datasets, and raw results.

Generating the datasets described in the paper works as follows. (We use glove-100-angular as an example.)

Related Publication

The following publication details design principles behind the benchmarking framework: