biomystery / snATAC_pipeline_scanpy_based

A single-cell ATAC pipeline based on scanpy and clustering on most variable genomic bins. Originated from Josh@Kyle's lab.
0 stars 0 forks source link

round 2 clustering on base of peaks #10

Open biomystery opened 4 years ago

biomystery commented 4 years ago
  1. [x] group reads based on cluster (make bed files) https://github.com/biomystery/snATAC_pipeline_scanpy_based/commit/33d1d48b0f0357c8a91da4dfbc03a18ab17955ec
  2. [x] call peaks on grouped reads https://github.com/biomystery/snATAC_pipeline_scanpy_based/commit/33d1d48b0f0357c8a91da4dfbc03a18ab17955ec
  3. [x] get peak cell matrix https://github.com/biomystery/snATAC_pipeline_scanpy_based/commit/33d1d48b0f0357c8a91da4dfbc03a18ab17955ec
  4. [ ] clustering on base on the peak-cell matrix
  5. [ ] get gene activity score
  6. [ ] runChromVar
biomystery commented 4 years ago

https://github.com/GreenleafLab/10x-scATAC-2019/blob/0a702b3a6e9050e55b6bde6f75a00736ec6bf343/code/02_Get_Peak_Set_hg19_v2.R#L332

#----------------------------
# Get Clusters in Peaks
#----------------------------
nTop <- 25000
nPCs1 <- 1:50
nPCs2 <- 1:50

message("Making Seurat LSI Object...")
obj <- seuratLSI(assay(se), nComponents = max(nPCs1), nFeatures = NULL)
stopifnot(identical(rownames(obj@meta.data), colnames(se)))
obj@meta.data <- as.data.frame(cbind(obj@meta.data, colData(se)))

message("Adding Graph Clusters...")
obj <- FindClusters(object = obj, reduction.type = "pca", dims.use = nPCs1, print.output = TRUE, n.start = 10)

#Make Pseudo Bulk Library
mat <- assay(se)
mat@x[mat@x > 0] <- 1
clusterSums <- groupSums(mat = mat, groups = paste0("C",obj@meta.data$res.0.8), sparse = TRUE)
logMat <- edgeR::cpm(clusterSums, log = TRUE, prior.count = 3)
varPeaks <- head(order(matrixStats::rowVars(logMat), decreasing = TRUE), nTop)

#Re-run Seurat LSI
message("Making Seurat LSI Object...")
obj2 <- seuratLSI(assay(se)[varPeaks,], nComponents = max(nPCs2), nFeatures = NULL)
stopifnot(identical(rownames(obj2@meta.data), colnames(se)))
obj2@meta.data <- as.data.frame(cbind(obj2@meta.data, colData(se)))

message("Adding Graph Clusters...")
obj2 <- FindClusters(object = obj2, reduction.type = "pca", dims.use = nPCs2, print.output = TRUE, n.start = 10)

#Plot uMAP
message("Running UMAP")
obj2 <- RunUMAP(object = obj2, reduction.use = "pca", dims.use = nPCs2)
plotUMAP <- data.frame(GetCellEmbeddings(obj2,reduction.type="umap"), obj2@meta.data)
colnames(plotUMAP) <- c("x","y",colnames(plotUMAP)[3:ncol(plotUMAP)])
clustCol <- colnames(plotUMAP)[grep("res",colnames(plotUMAP))]
colData(se)$Clusters <- paste0("Cluster",as.integer(plotUMAP[,clustCol]) + 1)
colData(se)$UMAP1 <- plotUMAP$x
colData(se)$UMAP2 <- plotUMAP$y

pdf("results/LSI-Clustering-Peaks.pdf")
ggplot(plotUMAP, aes(x=x,y=y,color=res.0.8)) + geom_point(size = 0.5) + 
  theme_bw() + xlab("UMAP1") + ylab("UMAP2")
dev.off()

saveRDS(se, "results/scATAC-Summarized-Experiment.rds")