CannyLab / tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings
BSD 3-Clause "New" or "Revised" License
1.81k stars 130 forks source link

Feature Request: Custom distance matrix input #109

Open RichieHakim opened 2 years ago

RichieHakim commented 2 years ago

FEATURE REQUEST:

In https://github.com/CannyLab/tsne-cuda/issues/8, the possibility of using a custom NN matrix is discussed and noted to be 'easy' to implement. DavidMChan: " It would be easy to add the ability to pass in a sparse nearest neighbors matrix, however it becomes more complicated if you want to extract the nearest neighbors from a pre-computed distance matrix."

It would be a significant improvement that would open up a lot of use cases if this were implemented. Specifically: allowing a user to input a custom distance matrix (ie a sparse knn_graph) would be amazing. It would be sufficient for users already familiar with and using this feature in sklearn's TSNE to directly port their workflow to tsne-cuda.

https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html metricstr or callable, default=’euclidean’: ...If metric is “precomputed”, X is assumed to be a distance matrix. ...

Thanks!

DavidMChan commented 2 years ago

I'll look into adding this (though, TBH, I can't promise anything), but I'm also happy to accept a PR to address this.

For future reference (and for anyone who wants to give it a shot), the idea would be to shortcut the logic for nearest neighbors here: https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/fit_tsne.cu#L118

It's not that hard to do, since the rest of the TSNE algorithm only requires a float distance array of size (N x # neighbors) and a similarly shaped array of the nearest neighbor indices.

The logic for passing arrays is already in place (since we handle pre-initialized T-SNE (see how preinit_data) is handled in https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/python/tsnecuda/TSNE.py), and how it's parsed into the actual function call in https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/ext/pymodule_ext.cu

All that would have to be done is to create a new option in the options file (just like the pre-init data), https://github.com/CannyLab/tsne-cuda/blob/b740a7d46a07ca9415f072001839fb66a582a3fa/src/include/options.h, and reference it during the main tsne call.

RichieHakim commented 2 years ago

This is still dearly hoped for.

loganylchen commented 11 months ago

I have the same requests here.