Question about recall calculation

erikbern / ann-benchmarks

Benchmarks of approximate nearest neighbor libraries in Python

http://ann-benchmarks.com

MIT License

4.88k stars 735 forks source link

Question about recall calculation #519

Closed oleg0x closed 4 months ago

oleg0x commented 4 months ago

Hello, Could you clarify please how recall is calculated in the case when an algorithm search returns 10 equal vectors each of which is the closest to the request? Is it possible that in such case the recall (from get_recall_values function?) will be 100%?

maumueller commented 4 months ago

We look at the distances of the reported points. Every point that is within the distance (+ small epsilon) of the k-th nearest neighbor is considered a correct result.

oleg0x commented 4 months ago

Suppose for k=10 an algorithm returns 10 copies of the same point which is the closest to a query. Will the recall be 100% or 10% in this case?

maumueller commented 4 months ago

Now I get what you mean. As far as I can see, we do not check whether an algorithm returns duplicated points. It would get 100% recall by just returning the nearest neighbor 10 times.

oleg0x commented 4 months ago

OK, thanks. But now it seems this is a loophole for ANNS algorithms developers :). If someone fix it, the results may change a little.