DmitryUlyanov / Multicore-TSNE

Parallel t-SNE implementation with Python and Torch wrappers.
Other
1.89k stars 228 forks source link

Squared euclidean distance breaks VPTree #17

Closed make closed 6 years ago

make commented 7 years ago

Squared euclidean distance cannot be used in VPTree search and thus sqrt() should be calculated for result: sqrt() should be used in https://github.com/DmitryUlyanov/Multicore-TSNE/blob/master/multicore_tsne/vptree.h#L71 or https://github.com/DmitryUlyanov/Multicore-TSNE/blob/master/multicore_tsne/vptree.h#L206

Using squared euclidean distance in VPTree causes search to not find all k nearest points. See https://github.com/lvdmaaten/bhtsne/pull/41#issuecomment-248636146 and http://stevehanov.ca/blog/index.php?id=130

"It is worth repeating that you must use a distance metric that satisfies the triangle inequality. I spent a lot of time wondering why my VP tree was not working. It turns out that I had not bothered to find the square root in the distance calculation. This step is important to satisfy the requirements of a metric space, because if the straight line distance to a <= b+c, it does not necessarily follow that a2 <= b2 + c2."

Omitting sqrt in VPTree search seems to bring increased performance because it doesn't search all necessary branches in tree. You can ensure that by calculating t-SNE with both metrics and using same initial coordinates. You will see that output differs a bit. I have done this in comment https://github.com/lvdmaaten/bhtsne/pull/41#issuecomment-248827988

DmitryUlyanov commented 6 years ago

Fixed, thanks!