broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
557 stars 164 forks source link

Help!!! Too many General_HCL_all_observations files and my infercnv.png is black #616

Open smukeji opened 10 months ago

smukeji commented 10 months ago

Hey all! I had some questions about the result of running infercnv. First,my output file have about 3000 General_HCL_all_observations files.Why these files exist? image image Second,my infercnv.png is black How can I solove it? image Here is my code. options(scipen = 100) gbm <-readRDS("GBMC+N+P") oligo <- readRDS("obligo.rds") oligo[["Cell_annotation"]] <- "Normal cell" gbm[["Cell_annotation"]] <- "Malignant_N2" DefaultAssay(oligo) <- "RNA" DefaultAssay(gbm) <- "RNA" cnvtest <- merge(oligo, gbm, project = "cnvtest")

counts_matrix <- cnvtest@assays$RNA@counts

cellannotation <- cnvtest@meta.data$Cell_annotation cellannotation <- as.data.frame(cellannotation) rownames(cellannotation) <- colnames(cnvtest@assays$RNA@counts)

infercnv_obj = CreateInfercnvObject(raw_counts_matrix=counts_matrix, annotations_file= cellannotation, delim="\t", gene_order_file= "geneLocate.txt", ref_group_names=c("Normal cell"))

infercnv_obj = infercnv::run(infercnv_obj, cutoff=0.1, # use 1 for smart-seq, 0.1 for 10x-genomics out_dir= 'cnv1/' , # dir is auto-created for storing outputs cluster_by_groups=F,
plot_steps=F,write_phylo = TRUE, denoise=T,HMM=F,num_threads=14,noise_logistic = T, sd_amplifier = 3,leiden_resolution=0.01 )

GeorgescuC commented 10 months ago

Hi @smukeji ,

1) Those files are generated, one per subcluster, because you specified the "write_phylo=TRUE" option. Each file will contain a list of the cells present in the subcluster associated. The reason why there are so many of them is related to 2). The reason why the prefix contains "all_observations" as part of the subcluster name is because you used the "cluster_by_groups=F" argument, so any annotation that is not defined as a reference has its cells combined in the "all observations" pool for subclustering.

2) This happens because there are too many subclusters that are too small, so the black bars that separate them end up covering the whole heatmap. You can see by the tree structure on the left side that every branch is a 1 cell branch basically. To fix this, you need to adjust the leiden_resolution option further down. If you only modify this argument, infercnv should restart the run at the subclustering step, so iterating with a couple different values should be fast enough. While iterating, I would use the "up_to_step=15" option as well so that it stops after the subclustering and lets you inspect them before finishing the rest of the run by removing that option.

Regards, Christophe.