Open ruiqiizhou opened 1 year ago
Ah, so using layers in anndata is a great trick, and it's easy to do. One way would be to use this function
You can use this code from a Jupyter notebook by doing
from cellbender.remove_background.downstream import load_anndata_from_input_and_output
adata = load_anndata_from_input_and_output(input_file='raw.h5', output_file='cellbender_output.h5')
or you could manually do the same thing yourself, kind of like this:
adata_raw # your raw data
adata # cellbender output
adata.layers['cellbender'] = adata.X.copy()
adata.layers['raw'] = adata_raw[adata.obs_names].layers['spliced'].copy()
Then you can use the different layers to make different UMAPs like this
# preprocess raw data
adata.X = adata.layers['raw'].copy()
sc.tl.highly_variable_genes(adata, n_top_genes=2000, flavor='seurat_v3')
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.tl.pca(adata, n_comps=20, use_highly_variable=True)
sc.pp.neighbors(adata, use_rep='X_pca')
# UMAP of raw data
sc.tl.umap(adata)
adata.obsm['X_umap_raw'] = adata.obsm['X_umap'].copy()
# preprocess cellbender data
adata.X = adata.layers['cellbender'].copy()
sc.tl.highly_variable_genes(adata, n_top_genes=2000, flavor='seurat_v3')
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.tl.pca(adata, n_comps=20, use_highly_variable=True)
sc.pp.neighbors(adata, use_rep='X_pca')
# UMAP of raw data
sc.tl.umap(adata)
adata.obsm['X_umap_cellbender'] = adata.obsm['X_umap'].copy()
del adata.obsm['X_umap']
Then you can do things like
sc.pl.embedding(adata, basis='X_umap_raw')
sc.pl.embedding(adata, basis='X_umap_cellbender')
I was trying to follow the tutorial: ‘Create some validation plots of various analyses with and without cellbender remove-background. One convenient way to do this is in scanpy and storing the raw count matrix and the background-removed count matrix as separate “layers”. UMAPs with and without (on the same set of cell barcodes) Marker gene dotplots and violin plots (you should see less background)’'
I am using anndata, I have raw anndata and after cellbender anndata. But I do not know how to 'storing the raw count matrix and the background-removed count matrix as separate “layers”. and plot UMAPs with and without (on the same set of cell barcodes)'
after hellbender, anndata like this : AnnData object with n_obs × n_vars = 1483 × 32227 obs: 'latent_RT_efficiency', 'latent_cell_probability', 'latent_scale' var: 'ambient_expression', 'feature_type', 'gene_id' uns: 'contamination_fraction_params', 'fraction_data_used_for_testing', 'lambda_multiplier', 'overdispersion_mean_and_scale', 'target_false_positive_rate', 'test_elbo', 'test_epoch', 'training_elbo_per_epoch' obsm: 'latent_gene_encoding'
This is my raw dataset:
AnnData object with n_obs × n_vars = 404099 × 56984 var: 'gene_ids', 'feature_types' layers: 'ambiguous', 'spliced', 'unspliced'