Closed shengyongniu closed 5 years ago
Hi @s18692001,
I understand how our choice of terminology can be confusing. We use "integration" to refer to learning a common low dimensional embedding and "batch correction" to refer to modifying the gene expression values to remove batch-specific differences.
So integration will give you a low dimensional representation of the data that can be used for visualization (e.g., t-SNE or UMAP). Batch correction will give you modified gene expression values. Integration is usually faster than batch correction, since it's only done in a low dimensional space, versus in the potentially high dimensional gene expression space.
What should I do if I want the batch effect corrected embeddings for drawing tSNE or UMAP? Should I just use the corrected output as the input for scanorama.integrate_scanpy?
@s18692001 you'd just pass the output from integrate_scanpy()
into a function that would further reduce to a 2-dimensional visualization like t-SNE or UMAP.
For example:
import scanorama
import scanpy.api as sc
from anndata import AnnData
...
integrated = scanorama.integrate_scanpy(adatas)
viz_adata = AnnData(X=np.vstack(integrated))
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.scatter(adata, basis='umap')
Hi, thanks for your fast reply! Based on your answer, is "integrated" already batch corrected ?
@s18692001 yup! Sorry for any confusion -- you don't need to call integrate()
before or after correct()
, those are just alternative commands that return different things (the first returns a low dimensional embedding, the second returns all of the genes). The output of both methods should be the data with batch effects removed.
In the readme, it mentions using the following commands to integrate and correct the adata files. What are the difference between these commands? It seems they are doing the same thing for integration and batch correction. How could I get one integrated and corrected AnnData, and then plotting and clustering with Scanorama ? Thanks!