erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.73k stars 715 forks source link

Running benchmark with custom dataset #484

Open mukunth1987 opened 5 months ago

mukunth1987 commented 5 months ago

Hello, Currently the repo has pre-defined dataset (glove, SIFT, NYTIMES). Can I use my own custom dataset(*.csv) instead of the above and run the benchmark ?

Thanks in advance.

maumueller commented 5 months ago

You won't be able to run directly on your csv file. You will first have to add some wrapper code to https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/datasets.py (see the DATASETS dictionary in the very bottom). Usually, this involves parsing the csv into a numpy array, splitting the dataset into train/test, and using write_output(...). Hopefully, you find the many examples which are already present useful.