Open thomasahle opened 1 year ago
yeah this is very confusing – I think it's a mistake. https://github.com/erikbern/ann-benchmarks/blob/main/ann_benchmarks/datasets.py#L427 indicates it's angular (cosine) distance too.
Maybe let's remove this dataset from the benchmarks for now.
@benfred should be able to shed some light on this.
The original intent was to test out inner-product distance (dot), not angular distance: https://github.com/erikbern/ann-benchmarks/pull/91 .
IIRC, the rationale was that certain algorithms either didn't support IP distance - or didn't have good performance when applying transforms like https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/XboxInnerProduct.pdf to convert IP distance to a cosine space
I think it's nice to have a dataset for dot products. But I'll fix that after I'm done with this run.
From the path http://ann-benchmarks.com/lastfm-64-dot_10_angular.html it seems that this dataset is actually angular. But the name indicates dot-product, which many of the algorithms don't natively support.