I observed that the results differ based on the number of threads specified.
In my application which used BH-SNE to create a 2D embedding followed by automated clustering using DBSCAN, I have replaced the single-threaded Rtsne call by a call to your multi-threaded Rtsne.multicore. This was nice&easy thanks to the similarity of both interfaces.
However, when I run the application, the results differ ever so slightly, as indicated below (just the first couple of points each time):
Using 1 thread
As you can see, the results are consistent between different runs using the same number of threads (here for 1 or 2 threads) yet differ when using different numbers of threads.
Moreover, I am confused as to why the results for 3 threads and 4 threads are different between two runs, i.e., behave differently than 1 or 2 threads.
This is quite puzzling to me and your input is highly appreciated!
This is a "foward" from https://github.com/RGLab/Rtsne.multicore/issues/7
Part 1
I observed that the results differ based on the number of threads specified.
In my application which used BH-SNE to create a 2D embedding followed by automated clustering using DBSCAN, I have replaced the single-threaded
Rtsne
call by a call to your multi-threadedRtsne.multicore
. This was nice&easy thanks to the similarity of both interfaces.However, when I run the application, the results differ ever so slightly, as indicated below (just the first couple of points each time): Using 1 thread
Using 2 threads
Using 3 threads
Using 4 threads
The results using the same number of threads seems to be consistent between different runs, though - which is good at least :)
Using 1 thread - a second run
And for all the points, computing the MD5SUM:
While the differences are hard to spot by eye - I mean in a 2D scatterplot -, the automatic clustering is affected by the differences.
Your input is greatly appreciated!
Part 2
I explore this further and here is a minimal working example:
and some demo output from Rstudio:
As you can see, the results are consistent between different runs using the same number of threads (here for 1 or 2 threads) yet differ when using different numbers of threads. Moreover, I am confused as to why the results for 3 threads and 4 threads are different between two runs, i.e., behave differently than 1 or 2 threads.
This is quite puzzling to me and your input is highly appreciated!
Best,
Cedric