MGI-tech-bioinformatics / DNBelab_C_Series_HT_scRNA-analysis-software

An open source and flexible pipeline to analysis high-throughput DNBelab C Series single-cell RNA datasets
MIT License
52 stars 20 forks source link

可以使用Soupx对dnbc4tools产生的raw_matrix和filter_matrix进行去环境RNA污染吗? #68

Closed kjHuoo closed 1 month ago

kjHuoo commented 2 months ago

感谢作者提供的dnbc4tools工具!

我们在使用dnbc4tools输出的filter_matrix进行后续的聚类和差异表达分析时,发现多个cluster的差异表达结果中,都出现了一些相同的高度差异表达的基因(Hbb、Hba、Igkc等开头的基因),但这些结果与生物学意义无关。我们认为这应该是测序时产生的环境污染RNA,想尝试使用SoupX来去除这些污染的RNA。但由于dnbelab C4的测序过程涉及到同一个液滴中有多个beads,dnbc4tools处理raw_matrix为filter_matrix的过程中,进行了细胞barcodes的合并(计算beads的相似度,合并同一个液滴中的beads),因此,根据SoupX的官方文档,针对我们的数据,输入有两种:

  1. 使用raw_matrix作为tod未过滤的矩阵,filter_matrix作为toc分析矩阵
  2. 只使用filter_matrix作为SoupX的输入

您更推荐哪种方法呢? 期待您的回复!

lishuangshuang0616 commented 2 months ago

可以使用soupX去污染,dnbc4tools软件版本2.1.1的raw_matrix拥有和filter_matrix一致的合并的后的cellID,建议使用该版本的矩阵。建议使用方案1,可参考:

library(SoupX)
run_soupx <- function(toc,tod,outdir,rho=NULL) {
toc <- Read10X(toc,gene.column=1)
tod <- Read10X(tod,gene.column=1)

tod <- tod[rownames(toc),]
all <- toc
all <- CreateSeuratObject(all)
all <- NormalizeData(all, normalization.method = "LogNormalize", scale.factor = 10000)
all <- FindVariableFeatures(all, selection.method = "vst", nfeatures = 3000)
all.genes <- rownames(all)
all <- ScaleData(all, features = all.genes)
all <- RunPCA(all, features = VariableFeatures(all), npcs = 40, verbose = F)
all <- FindNeighbors(all, dims = 1:30)
all <- FindClusters(all, resolution = 0.5)
all <- RunUMAP(all, dims = 1:30)
matx <- all@meta.data

sc = SoupChannel(tod, toc)
sc = setClusters(sc, setNames(matx$seurat_clusters, rownames(matx)))
if (is.null(rho)) {
tryCatch(
{
    png( 
    filename = "soupX.png",width = 480,height = 480,units = "px",bg = "white",res = 72)
    sc = autoEstCont(sc)
    dev.off()
    }, 
error=function(e) {
print("autoEstCont Error !")
sc = setContaminationFraction(sc, 0.2)}
)
}else{
sc = setContaminationFraction(sc, rho)
}
out = adjustCounts(sc)
DropletUtils:::write10xCounts(outdir, out,version="3")
}

options(stringsAsFactors = F)
library(future)
library(Seurat)
library(tidyverse)
library(patchwork)
indir = 'demo/output'
setwd('demo/output')
run_soupx('filter_matrix','raw_matrix','soupX_matrix')
kjHuoo commented 2 months ago

好的,我试试看,感谢回复!