broadinstitute / CellBender

CellBender is a software package for eliminating technical artifacts from high-throughput single-cell RNA sequencing (scRNA-seq) data.
https://cellbender.rtfd.io
BSD 3-Clause "New" or "Revised" License
297 stars 54 forks source link

Fixed Single cell RNA seq #372

Open archana433 opened 4 months ago

archana433 commented 4 months ago

Hi, I just want to ask , can we use CellBender for Single Cell Gene Expression Flex Fixed RNA Profiling (FRP) seq samples to remove backgroud noise / Ambient RNA / empty Droplets because this seq is probe based sequencing.

Thank you

LinearParadox commented 4 months ago

I've used it and it worked well! You have to do some manual preprocessing because of duplicated features.

I've discussed it here: https://github.com/broadinstitute/CellBender/issues/234

My most recent code for summing features in a non memory hungry way is:

import scanpy as sc
import numpy as np
import anndata
from scipy.sparse import csr_matrix
import sys
path = sys.argv[1]
adata = sc.read_10x_h5(path+"/sample_raw_feature_bc_matrix.h5")
var = adata[:, adata.var_names.duplicated()].var[~adata[:, adata.var_names.duplicated()].var.index.duplicated(keep='first')]
adata_x=csr_matrix(np.concatenate([adata[:, n].X.sum(axis=1) for n in var.index], axis=1))
double_probes = anndata.AnnData(X=adata_x, obs=adata.obs, var=var)
final=anndata.concat([adata[:, ~(adata.var.index.isin(double_probes.var.index))], double_probes], axis=1)
del adata
adata_filtered = sc.read_10x_h5(path+"/sample_filtered_feature_bc_matrix.h5") 
adata_filtered_feature = final[:, final.var.gene_ids.isin(adata_filtered.var.gene_ids)].copy()
adata_filtered_feature.write(path+"/my_feature_filtered_file.h5ad")