time taken for benchmarking to finish the execution

erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python

http://ann-benchmarks.com

MIT License

4.73k stars 715 forks source link

time taken for benchmarking to finish the execution #499

Closed dipshirajput closed 1 month ago

dipshirajput commented 3 months ago

Hi all,

i wanted to test the benchmark performance of Qdrant and Milvus and so i ran the run.py script. but it has been 5 days approx and the benchmark is still running. please let me know how much more time it might take for the benchmarking to finish ?

Also, please let me know if there is any way to minimize the timing by restricting the dataset or specifying the only 2 db , qdrant and milvus for the benchmark execution.

Thanks in advance.

maumueller commented 3 months ago

A small example to run only these two algorithms on the smallest real-world dataset available is:

for algo in milvus qdrant; do python3 run.py --algorithm $algo --dataset fashion-mnist-784-euclidean --runs 1 --run-disabled; done

This assumes that you already built the container for these two using the install.py script.

dipshirajput commented 3 months ago

@maumueller
python3 run.py --algorithm $algo --dataset fashion-mnist-784-euclidean --runs 1 --run-disabled; done

this is the command , is it ? also, which part of this command specifies milvus and qdrant ?

maumueller commented 3 months ago

This uses bash to run both implementations. If the above doesn't make sense to you, you can execute:

python3 run.py --algorithm milvus --dataset fashion-mnist-784-euclidean --runs 1 --run-disabled

wait until it has finished, and then run

python3 run.py --algorithm qdrant --dataset fashion-mnist-784-euclidean --runs 1 --run-disabled

dipshirajput commented 3 months ago

ohk. any approximation of how much time individually these can take ?

maumueller commented 3 months ago

It seems to me that qdrant searches through a lot of hyperparameters, so I'm not sure. If it's more than a day on this tiny dataset, their parameter search is clearly too exhaustive. You can further restrict the number of hyper-paramters by setting --max-n-algorithms to a small number.

dipshirajput commented 3 months ago

ohk, i will try with this. also, do we need to run the databases seperately or run.py already gets the databases into picture for performance benchmarking ?