brianhie / scanorama

Panoramic stitching of single cell data
http://scanorama.csail.mit.edu
MIT License
265 stars 49 forks source link

Command usage #31

Closed shengyongniu closed 5 years ago

shengyongniu commented 5 years ago

In the readme, it mentions using the following commands to integrate and correct the adata files. What are the difference between these commands? It seems they are doing the same thing for integration and batch correction. How could I get one integrated and corrected AnnData, and then plotting and clustering with Scanorama ? Thanks!

# Integration.
integrated = scanorama.integrate_scanpy(adatas)

# Batch correction.
corrected = scanorama.correct_scanpy(adatas)

# Integration and batch correction.
integrated, corrected = scanorama.correct_scanpy(adatas, return_dimred=True)
brianhie commented 5 years ago

Hi @s18692001,

I understand how our choice of terminology can be confusing. We use "integration" to refer to learning a common low dimensional embedding and "batch correction" to refer to modifying the gene expression values to remove batch-specific differences.

So integration will give you a low dimensional representation of the data that can be used for visualization (e.g., t-SNE or UMAP). Batch correction will give you modified gene expression values. Integration is usually faster than batch correction, since it's only done in a low dimensional space, versus in the potentially high dimensional gene expression space.

shengyongniu commented 5 years ago

What should I do if I want the batch effect corrected embeddings for drawing tSNE or UMAP? Should I just use the corrected output as the input for scanorama.integrate_scanpy?

brianhie commented 5 years ago

@s18692001 you'd just pass the output from integrate_scanpy() into a function that would further reduce to a 2-dimensional visualization like t-SNE or UMAP.

For example:

import scanorama
import scanpy.api as sc
from anndata import AnnData

...
integrated = scanorama.integrate_scanpy(adatas)
viz_adata = AnnData(X=np.vstack(integrated))
sc.pp.neighbors(adata)
sc.tl.umap(adata)
sc.pl.scatter(adata, basis='umap')
shengyongniu commented 5 years ago

Hi, thanks for your fast reply! Based on your answer, is "integrated" already batch corrected ?

brianhie commented 5 years ago

@s18692001 yup! Sorry for any confusion -- you don't need to call integrate() before or after correct(), those are just alternative commands that return different things (the first returns a low dimensional embedding, the second returns all of the genes). The output of both methods should be the data with batch effects removed.