CannyLab / tsne-cuda

GPU Accelerated t-SNE for CUDA with Python bindings
BSD 3-Clause "New" or "Revised" License
1.8k stars 129 forks source link

[Feature Request] support n_components=3, happy to contribute :) #71

Open shun-lin opened 5 years ago

shun-lin commented 5 years ago

Hi!

I recently need to need to use tsne / umap to visualize the embeddings generated from some tf models I am testing, and I found this repository and it's super useful and super fast, thanks so much! I just wonder if I can help contribute to support n_components=3 as I would also like to visualize it in 3D, if it's feasible to do so. If so, may you give me a few pointers on where to start? Thanks!

DavidMChan commented 5 years ago

Thanks for your interest on developing for TSNE-CUDA! I don't think there's a huge amount of complexity, at least mathematically in extending it to 3D visualization. There is, however, a huge amount of generally annoying development work.

There are a few things that need to be handled. The main function is in src/fit_tsne.cu. First, all of the vectors that are designed to handle X-Y points need to be expanded so they can handle an arbitrary (or at least 2/3D) dimensions (for example, the vector on line 164). Next, the CUDA kernels in src/kernels need to be re-written so that they correctly index the arrays for 3D points. This is mostly systematic, but can be tricky to get right. Finally, the repulsive force calculation needs to be re-adapted to handle 3 dimensions. Our original FIT-tsne code is based on the repository here: https://github.com/KlugerLab/FIt-SNE which supports 3D, but we didn't pull a lot of the 3D code.

In the end, it's not a crazy mathematical challenge, but there's a pretty large amount of code to be re-written. If you start a branch/fork we'd love to assist as needed!

shun-lin commented 5 years ago

Thanks for the pointers @DavidMChan will look into it!

LucaCappelletti94 commented 4 years ago

Did you have any luck with this? I would love to be able to use it for a package I made for 3D visualizations.

shun-lin commented 4 years ago

Hi @LucaCappelletti94 , nothing yet, didn't get too much time to be able to dig deeper.

DavidMChan commented 4 years ago

I'm actually just now starting to get back into active development for this code - and it's on my ToDo list (as one of the most requested features).

shun-lin commented 4 years ago

yay thanks @DavidMChan :) :) Very excited!

DavidMChan commented 4 years ago

Approximately what sized datasets are you thinking of for 3D visualization? I've been going through our code and the potential implementation, and the scaling for 3D isn't very good when it comes to the FIt-SNE algorithm. You might actually be better off using a tool like https://github.com/tensorflow/tfjs-tsne which is somewhat slower, but will likely (fundamentally) scale better to higher dimension.

LucaCappelletti94 commented 4 years ago

It still could be useful by just running it on a significant subset of any given dataset, keeping all the pipeline in python.

shun-lin commented 4 years ago

@DavidMChan for my use case my dimensions is roughly [128, 1M] (128 dimension, ~ million of examples), would tfjs-tsne perform better in this case? And I agree with @LucaCappelletti94 that it would be nicer to keep everything in python as well (one of my main use-case is to show visualization on Google Colab). Thanks so much :)