Closed antonioggsousa closed 2 years ago
Hi @antonioggsousa, this analysis: https://www.nature.com/articles/s41592-021-01336-8 reports that Scanorama works best with log normalization and scaling (they use Scanpy).
Yes, the output of Scanorama is the low dimensional embedding, which is used to compute the k-nearest neighbors graph, which is then used for visualization and clustering.
Dear @brianhie,
Thank you and your colleagues for developing
scanorama
! I'm testing it through a few "dummy" examples and I'm delighted with the results.I read the paper as well as one of the tutorials mentioned in the github README.md file.
In order to test
scanorama
, I run it with a few toy data sets in addition to one example data set highlighted in thescanorama
repository. When I started with the toy data sets I provided scaled counts toscanorama
by mistake due to the less familiarity withscanpy
,anndata
andpython
in general. Therefore, I checked the paper and the tutorial again to find which inputscanorama
requires. The tutorial mentions at some point log-normalized gene expression counts whereas the paper mentions thatl2 normalization
is performed internally. If I understood correctly it aims to standardize the cells to the same scale, i.e., to unit norm. Thus, its application is not necessarily dependent on previous normalization. Then, my question is: which should ideally be the input toscanorama
, log-normalized or raw counts?Regarding the tests that I've performed, the results obtained with raw counts seem slightly better than the ones obtained with log-normalized counts.
Another small doubt that I've is related with the integration result, i.e.,
X_scanorama
, thatscanorama
provides. My understanding is that this low-dimensional embedding is intended to be used for UMAP/t-SNE estimation and visualization (among others downstream tasks) based on the tutorial mentioned above and the paper. For instance, in the tutorial they calculate a neighborhood graph and UMAP with this result:If
X_scanorama
is a low dimensional embedding should we plot this directly?Thank you and sorry for the off topic question!
Best regards,
António