Closed andy-kimball closed 6 hours ago
Add a CLI that makes it easy to benchmark the quality and performance of vector indexing. The CLI downloads test datasets from a GCP bucket and then builds and searches the index. It outputs results in a spreadsheet-friendly format like this:
unsplash-512-euclidean 1000000 train vectors, 1000 test vectors, 512 dimensions, 16/128 min/max partitions, base beam size 8 beam recall leaf all full partns qps 1 22.10% 91 247 23.61 4.00 1357.12 2 31.35% 182 339 27.65 5.00 1867.50 4 47.86% 362 610 31.96 8.00 1783.30 8 67.96% 727 1220 35.70 15.00 1729.00 16 82.00% 1450 2302 40.41 27.00 1629.65 32 90.70% 2894 4462 44.17 51.00 1301.63 64 95.61% 5783 8772 47.30 99.00 791.74 128 98.32% 11559 17374 49.60 195.00 535.10 256 99.47% 23099 34391 50.83 387.00 298.24 512 99.83% 46150 57517 51.28 644.00 189.69
Epic: CRDB-42943
Release note: None
This change is
bors r=drewkimball
Build succeeded:
Add a CLI that makes it easy to benchmark the quality and performance of vector indexing. The CLI downloads test datasets from a GCP bucket and then builds and searches the index. It outputs results in a spreadsheet-friendly format like this:
Epic: CRDB-42943
Release note: None