Closed yeuyeuh closed 6 years ago
Hi Inaki, In our manuscript we plot the "distance matrix t-sne". In a conventional t-sne run (Rtsne library), a PCA step precedes the actual t-SNE algorithm, and is meant to amplify the "interesting" signal in data assuming that the "interesting" signal is among the first (let's say 30) PCs. For batch corrected data however, these first 30 PCs capture a residual batch effect that has not been completely removed, (even though the plot on PC1 and PC2 looks still fine). Using the distance matrix t-sne skips the preceding PCA step, thus avoids such unwanted partial amplifications of data.
Hi Laleh,
Thanks for your clear answer.
If I understand it right: You choose to use the "distance matrix t-sne" because you are working on a cosine-normalized matrix. So when you compute the euclidean distance, the distance matrix (cosine distances) is robust to residual batch effect. But when you are working on corrected matrix normalized on the log-scale, you don't use the "distance matrix t-sne" but you use the "conventional t-sne". Is that right?
No, we never use the "conventional t-sne". It is only included in the code for comparison and out of curiosity, it is not used anywhere for the manuscript. With "distance matrix t-sne" we compute the distance matrix on the whole data, whereas "conventional t-sne" computes the distance matrix on only 30 first PCs.
Ok, thanks for the information.
Sure, thanks for the question.
Hi,
Thanks for providing the R codes of your manuscript. I'm using the last version of the scran package (version 1.7.11).
In PancreasCorrectionComparison.R, you used two different methods to generate the t-sne of the corrected matrix: -conventional method (gene-cell matrix as input and pca calculation) -distance matrix as input
In the figures of your manuscript, do you plot the "distance matrix t-sne"?
For my data, I provide a UMI count matrix to mnnCorrect() and I use cos.norm.in=TRUE, cos.norm.out=TRUE. The "distance matrix tsne" of the corrected data seems pretty good (the batch are merged together), but the "conventional tsne" doesn't merge the different batchs... However, when I run a PCA on the corrected matrix, the batch effect seems to be removed.
Can you explain why there is such a difference between the two methods used by t-sne? Which one shall we choose?
Thanks, Inaki Cervera-Marzal