broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
556 stars 164 forks source link

output files "infercnv.references.txt" and "infercnv.observations.txt " not found #520

Open wcl2260 opened 1 year ago

wcl2260 commented 1 year ago

Dear team, Thanks for the great package!

I have several problems with running infercnv::run() function.

1-first, it has been running for over 1 week for about 8000 observations and 9221 refernces (Observation data size: Cells= 8389 Genes= 6045, Reference data size: Cells= 9221 Genes= 6045 ). Is this normal?

2-second, the run step jumped directly from 20 (STEP 20: Converting HMM-based CNV states to repr expr vals) to 22 (STEP 22: Denoising step) while the official website shows only 21 steps.

This is the script: infercnv_obj = CreateInfercnvObject(raw_counts_matrix="D:/TBNC/12.inferCNV_scCNV/expFile.txt", annotations_file="D:/TBNC/12.inferCNV_scCNV/metaFiles.txt", delim="\t", gene_order_file="D:/TBNC/12.inferCNV_scCNV/geneFile.txt", ref_group_names=c("T_cells") infercnv_obj = infercnv::run(infercnv_obj, cutoff=0.1, out_dir='infercnv_out_subclusters_mode_20230301/', cluster_by_groups=TRUE, analysis_mode="subclusters", denoise=TRUE, HMM=TRUE, output_format = "pdf")

3-third, output files "infercnv.references.txt" and "infercnv.observations.txt " were not found. What might be the cause of the failure to output these files?

4-fourth, is it possible to discriminate which are malignant and which are not from this graph. Also, why is the graph did not show anysubcluster as option "cluster_by_group" was "TRUE"? Is any parameter missed?(please see image) 无标题

GeorgescuC commented 1 year ago

Hi @wcl2260 ,

1- This is most likely related to 4-, the number of subclusters generated is too high, and because CNVs are called per subcluster, there are a lot more CNVs called and they are of lower confidence. This makes the Bayesian network step take a long time to run.

2- They used to be only 21 steps, but one of them has been split in two to enable changing the Bayesian filtering threshold without having to rerun the whole model, so the number increased to 22. There are some more changes coming so I was planning to update the wiki when those are released.

3- The output of those files when plotting has been disabled by default, but you can easily enable it again by adding the option "write_expr_matrix=TRUE" to run().

4- There are a few things here. First the "cluster_by_groups" option means that cells will be clustered first by the annotations provided as input before subclustering as opposed to combining all of them together before doing the subclustering. In your case, there is only 1 annotation for observation cells, "Epithelial_cells", so the option does not effectively change anything. To discriminate the cells based on CNVs and call malignancy, you should be able to use the HMM results (steps 17 through 20). However based on the dendrogram on the left, it is clear that the subclustering is too sensitive, and as mentioned in 1-, the HMM calls CNVs per subclusters, so it is important that they are good. You can improve the subclustering by tweaking the settings relating to Leiden, mainly the leiden_resolution parameter, which by default is 0.05 but you will need to reduce to 0.01 or lower it seems. If you are able to install the latest version of infercnv from Github's master branch, I added an extra plot after the subclustering that shows the subclusters so you can verify their quality. The feature should be in the stable version you can download from BioConductor in a few days.

Regards, Christophe.

wcl2260 commented 1 year ago

@GeorgescuC Hi Christophe, I have rerun the inferCNV with the leiden_resolution parameter setting as 0.01, 0.005 and 0.001. It was expecting to indentify several subclusters in observation cells it seems (seems not right). However, the number of the subclusters seems unsatisfactory (It was supposed to identyfy more than 2 subclusters, but it still had only "Epithelial_cells"). Is the leiden_resolution parameter needed to reduce to lower? (please see image, leiden_resolution = 0.001) 无标题

And, I was upset to tell you that I do something wrong when trying to update the  inferCNV (version 1.15.1) package to the latest version (version 1.15.2). I removed inferCNV (version 1.15.1) and reinstalled inferCNV using the following script:

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("infercnv")

The newly installed package was turned out to be inferCNV (version 1.14.2). Could you please tell me how to update inferCNV to the latest version of inferCNV? Thank you very much!

wcl2260 commented 1 year ago

@GeorgescuC Hi Christophe, Thanku for your attention to this issue.

  I would try running infercnv again using the leiden_resolution option set to 0.00005 after installing the latest version of infercnv (1.15.2). I got an error in STEP 15: 

Error in seq_len(max(obs_annotations_groups)) : argument must be coercible to non-negative integer In addition: Warning message: In max(nchar(obs_annotations_names)) : no non-missing arguments to max; returning -Inf

How to fix it? Here is my code: Sys.setenv(JAGS_HOME="C:/Program Files/JAGS/JAGS-4.3.1") library("coda") library("rjags") library("devtools") library(infercnv) library(Seurat) library(ggplot2)

library(parallel) infercnv_obj = CreateInfercnvObject(raw_counts_matrix="D:/TBNC/12.inferCNV_scCNV_forTcells_and_Epi_only/expFile.txt", annotations_file="D:/TBNC/12.inferCNV_scCNV_forTcells_and_Epi_only/metaFiles.txt", delim="\t", gene_order_file="D:/TBNC/12.inferCNV_scCNV_forTcells_and_Epi_only/geneFile.txt", ref_group_names=c("T_cells"))

future::plan("multicore",workers=40) infercnv_obj = infercnv::run(infercnv_obj, cutoff=0.1, out_dir='infercnv_out_20230311/', cluster_by_groups=FALSE,
analysis_mode="subclusters", leiden_resolution=0.0005, denoise=TRUE, HMM=FALSE, save_final_rds = TRUE, write_expr_matrix=TRUE, output_format = "pdf") 无标题

wcl2260 commented 1 year ago

@GeorgescuC Hi Christophe, I encounter the following error when I rerun the inferCNV: Error in if (runif(1) <= padj) { : missing value where TRUE/FALSE needed

This is my infercnv code: infercnv_obj = CreateInfercnvObject(raw_counts_matrix="D:/TBNC/12.inferCNV_scCNV_forTcells_and_Epi_only/expFile.txt", annotations_file="D:/TBNC/12.inferCNV_scCNV_forTcells_and_Epi_only/metaFiles.txt", delim="\t", gene_order_file="D:/TBNC/12.inferCNV_scCNV_forTcells_and_Epi_only/geneFile.txt", ref_group_names=c("T_cells"))

future::plan("multicore",workers=40) infercnv_obj = infercnv::run(infercnv_obj, cutoff=0.1, out_dir='infercnv_out_0.0005_20230312/', cluster_by_groups=TRUE,
k_obs_groups=8, analysis_mode="subclusters", leiden_resolution=0.0005, denoise=TRUE, HMM=TRUE, save_final_rds = TRUE, write_expr_matrix=TRUE, output_format = "pdf")

Any ideas why this might be the case? Thank you!

GeorgescuC commented 1 year ago

Hi @wcl2260 ,

First, for the install issue, version 1.15.2 is on BioConductor's devel so you need to specify "BiocManager::install(version='devel')" before the install line. If you are unable to install from there, it might be due to the devel version using R version 4.3 as well, in which case you can install infercnv from this Github directly instead.

For the subclustering, the results using "leiden_resolution=0.001" seem good based on the branches of the dendrogram, where the top level splits form a descending stair (like can be on the figure in your first post). The legend you see at the bottom with only "Epithelial_cells" will always display the annotations you provided as input, not the subclustering, except if specifically using the "plot_subclusters()" method.

For the first error ("Error in seq_len(max(obs_annotations_groups)) :"), I encountered it while looking at a different post and fixed it in the latest commit. The issue happens when running plot_subclusters() on an object that was analyzed with cluster_by_groups=FALSE, but does not affect the rest of the run, and does not rerun if the subclustering finished, which is why when you rerun infercnv, a different error appeared.

For the second issue, could you post the full log?

Regards, Christophe.