erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python
http://ann-benchmarks.com
MIT License
4.88k stars 735 forks source link

Fix recall for Hamming distance #508

Closed ankane closed 5 months ago

ankane commented 5 months ago

Both sift-256-hamming and word2bits-800-hamming always report 0 recall, as the distances in the HDF5 files are floats between 0 and 1 rather than the Hamming distance. Multiplying by the dataset dimensions fixes it.

This issue is likely the cause of #420.

ankane commented 5 months ago

Another approach would be to change Hamming distance to use mean.

 metrics = {
     "hamming": Metric(
-        distance=lambda a, b: np.sum(a.astype(np.bool_) ^ b.astype(np.bool_)),
+        distance=lambda a, b: np.mean(a.astype(np.bool_) ^ b.astype(np.bool_)),
         distance_valid=lambda a: True
     ),
maumueller commented 5 months ago

Thanks!

ankane commented 5 months ago

Thanks @maumueller!