Open shun-lin opened 5 years ago
Thanks for your interest on developing for TSNE-CUDA! I don't think there's a huge amount of complexity, at least mathematically in extending it to 3D visualization. There is, however, a huge amount of generally annoying development work.
There are a few things that need to be handled. The main function is in src/fit_tsne.cu
. First, all of the vectors that are designed to handle X-Y points need to be expanded so they can handle an arbitrary (or at least 2/3D) dimensions (for example, the vector on line 164). Next, the CUDA kernels in src/kernels
need to be re-written so that they correctly index the arrays for 3D points. This is mostly systematic, but can be tricky to get right. Finally, the repulsive force calculation needs to be re-adapted to handle 3 dimensions. Our original FIT-tsne code is based on the repository here: https://github.com/KlugerLab/FIt-SNE which supports 3D, but we didn't pull a lot of the 3D code.
In the end, it's not a crazy mathematical challenge, but there's a pretty large amount of code to be re-written. If you start a branch/fork we'd love to assist as needed!
Thanks for the pointers @DavidMChan will look into it!
Did you have any luck with this? I would love to be able to use it for a package I made for 3D visualizations.
Hi @LucaCappelletti94 , nothing yet, didn't get too much time to be able to dig deeper.
I'm actually just now starting to get back into active development for this code - and it's on my ToDo list (as one of the most requested features).
yay thanks @DavidMChan :) :) Very excited!
Approximately what sized datasets are you thinking of for 3D visualization? I've been going through our code and the potential implementation, and the scaling for 3D isn't very good when it comes to the FIt-SNE algorithm. You might actually be better off using a tool like https://github.com/tensorflow/tfjs-tsne which is somewhat slower, but will likely (fundamentally) scale better to higher dimension.
It still could be useful by just running it on a significant subset of any given dataset, keeping all the pipeline in python.
@DavidMChan for my use case my dimensions is roughly [128, 1M] (128 dimension, ~ million of examples), would tfjs-tsne
perform better in this case? And I agree with @LucaCappelletti94 that it would be nicer to keep everything in python as well (one of my main use-case is to show visualization on Google Colab). Thanks so much :)
Hi!
I recently need to need to use tsne / umap to visualize the embeddings generated from some tf models I am testing, and I found this repository and it's super useful and super fast, thanks so much! I just wonder if I can help contribute to support n_components=3 as I would also like to visualize it in 3D, if it's feasible to do so. If so, may you give me a few pointers on where to start? Thanks!