broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
529 stars 159 forks source link

Computing nearest neighbor graph: Error in m[match(oldnodes,m)] <-1:(N-1): NAs are not allowed in sybscripted assignments. #510

Open krigia opened 1 year ago

krigia commented 1 year ago

@dinvlad @bistline @coreone @Puriney @eweitz @GeorgescuC Hello inferCNV team,

I am running the inferCNC code and it runs properly, until I get the following error:

Computing nearest neighbor graph Computing SNN Error in m[match(oldnodes,m)] <-1:(N-1): NAs are not allowed in subscripted assignments.

Attached is the whole run with the error at the end infer-error.docx

I would greatly appreciate your feedback. Thank you!

GeorgescuC commented 1 year ago

Hi @krigia ,

How many cells does the "Tumor cell" annotation group have?

Can you also try to rerun infercnv after executing options(error = function() traceback(2))? The specific line of code where the error happens seems to be in a method from igraph but I am not sure exactly where.

Regards, Christophe.

krigia commented 1 year ago

The tumor cell annotation groups has some thousands of cells. I still get errors. Please see below:

PC_ 5 Positive: ISCA2, YLPM1, NPC2, ABCD4, DLST, ELMSAN1, PNMA1, PGF, NUMB, PSEN1 EIF2B2, RBM25, MLH3, MED6, ACYP1, SYNJ2BP, TMED10, COX16, FOS, IFT43 GPATCH2L, SRSF5, IRF2BPL, GSTZ1, SLC39A9, TMED8, AHSA1, SPTLC2, SLIRP, ERH Negative: PIK3R3, POMGNT1, LRRC41, TMEM69, ATPAF1, UQCRH, GPBP1L1, CMPK1, BEND5, NASP AKR1A1, ELAVL4, PRDX1, UROD, FAF1, EIF2B3, PLK3, CDKN2C, KIF2C, RNF220 RNF11, ERI3, EPS15, DMAP1, B4GALT2, OSBPL9, ATP6V0B, NRDC, PTPRF, HYI Computing nearest neighbor graph Computing SNN Error in parallelDist(t(tumor_expr_data[, which(partition == i), drop = FALSE]), : lazy-load database '/usr/local/Cellar/r/4.2.2/lib/R/library/RcppParallel/R/RcppParallel.rdb' is corrupt In addition: Warning message: In parallelDist(t(tumor_expr_data[, which(partition == i), drop = FALSE]), : internal error -3 in R_decompress1 7: (function () traceback(2))() 6: parallelDist(t(tumor_expr_data[, which(partition == i), drop = FALSE]), threads = infercnv.env$GLOBAL_NUM_THREADS) 5: hclust(parallelDist(t(tumor_expr_data[, which(partition == i), drop = FALSE]), threads = infercnv.env$GLOBAL_NUM_THREADS), method = hclust_method) 4: as.phylo(hclust(parallelDist(t(tumor_expr_data[, which(partition == i), drop = FALSE]), threads = infercnv.env$GLOBAL_NUM_THREADS), method = hclust_method)) 3: .single_tumor_leiden_subclustering(tumor_group = tumor_group, tumor_group_idx = tumor_group_idx, tumor_expr_data = tumor_expr_data, k_nn = k_nn, leiden_resolution = leiden_resolution, leiden_method = leiden_method, leiden_function = leiden_function, hclust_method = hclust_method) 2: define_signif_tumor_subclusters(infercnv_obj = infercnv_obj, p_val = tumor_subcluster_pval, k_nn = k_nn, leiden_resolution = leiden_resolution, leiden_method = leiden_method, leiden_function = leiden_function, leiden_method_per_chr = leiden_method_per_chr, leiden_function_per_chr = leiden_function_per_chr, leiden_resolution_per_chr = leiden_resolution_per_chr, hclust_method = hclust_method, cluster_by_groups = cluster_by_groups, partition_method = tumor_subcluster_partition_method, per_chr_hmm_subclusters = per_chr_hmm_subclusters, z_score_filter = z_score_filter) 1: infercnv::run(infercnv_obj_dong, cutoff = 0.1, out_dir = out_dir, cluster_by_groups = TRUE, plot_steps = FALSE, denoise = TRUE, HMM = FALSE, no_prelim_plot = TRUE, png_res = 60)

I would greatly appreciate your input. Thank you

GeorgescuC commented 1 year ago

Hi @krigia ,

The error indicates that there is an issue with your install/status of the parallelDist package in R, that infercnv uses. First I would restart a fresh R session and try to load packages again. If the issue persists, I would then try to run install.packages("parallelDist") again and restart. "Error in parallelDist(t(tumor_expr_data[, which(partition == i), drop = FALSE]), : lazy-load database '/usr/local/Cellar/r/4.2.2/lib/R/library/RcppParallel/R/RcppParallel.rdb' is corrupt"

Looking at the path, it also seems you might be on macOS. I think I have had more issues with installing R from Brew than through the CRAN installer in the past so I stopped using that version. If issues persists that may be an option.

Regards, Christophe.

krigia commented 1 year ago

Hi @GeorgescuC Yes, I am running in in MacOS. I had multiple issues which I solved but still getting errors. I restarted R re-running all R packages but see below:

INFO [2023-03-01 17:51:28] Parsing matrix: /Volumes/Pegasus_R4i/NB-scRNA-seq/Public-available-scRNA-seq-datasets/Kameneva-GSE147821_RAW/dong_sc_count_matrix-1.txt INFO [2023-03-01 18:22:51] Parsing gene order file: /Volumes/Pegasus_R4i/gencode_v27.txt Error in infercnv::CreateInfercnvObject(raw_counts_matrix = "/Volumes/Pegasus_R4i/NB-scRNA-seq/Public-available-scRNA-seq-datasets/Kameneva-GSE147821_RAW/dong_sc_count_matrix-1.txt", : lazy-load database '/usr/local/Cellar/r/4.2.2/lib/R/library/infercnv/R/infercnv.rdb' is corrupt

In addition: Warning message: In infercnv::CreateInfercnvObject(raw_counts_matrix = "/Volumes/Pegasus_R4i/NB-scRNA-seq/Public-available-scRNA-seq-datasets/Kameneva-GSE147821_RAW/dong_sc_count_matrix-1.txt", : internal error -3 in R_decompress1 2: (function () traceback(2))() 1: infercnv::CreateInfercnvObject(raw_counts_matrix = "/Volumes/Pegasus_R4i/NB-scRNA-seq/Public-available-scRNA-seq-datasets/Kameneva-GSE147821_RAW/dong_sc_count_matrix-1.txt", annotations_file = "/Volumes/Pegasus_R4i/NB-scRNA-seq/Public-available-scRNA-seq-datasets/Kameneva-GSE147821_RAW/newannot.txt", delim = "\t", gene_order_file = "/Volumes/Pegasus_R4i/gencode_v27.txt", ref_group_names = c("T cell"))

Your advice is much appreciated. Thank you

GeorgescuC commented 1 year ago

Hi @krigia ,

The issue seems to be in the same vein as the previous one. "lazy-load database [...] is corrupt" is an error with package installation/loading. Restarting R or reinstalling the package may work, but there could be another instance of the error in a different package. Is trying to switch to the CRAN release of R an option?

Otherwise you can try using the docker image which should not have such issues as long as you don't link your home directory inside the image. I updated the docker images just a few days ago so they are up to date.

Regards, Christophe.

krigia commented 1 year ago

@GeorgescuC

I use CRAN but I still get errors.

Error in value[[3L]] (cond) : Package 'dplyr' version 1.1.0 cannot be uploaded: Error in unloadNamespace(package): namespace 'dplyr' is imported by 'sctransform', 'plotly', 'tidyr', 'infercnv' so cannot be unloaded.

I removed.packages "seurat", "ggplot2", "plotly", but again the same error. I would greatly appreciate your feedback.

Thank you,

GeorgescuC commented 1 year ago

Hi @krigia ,

When you switched between the Brew and Cran versions, were the installed packages kept? If yes, it might be easier to remove all the packages and reinstall them fresh.

As for the issue itself, the error message says "cannot be uploaded" but I can only find similar errors that say "cannot be *unloaded" (which fits with the next error line). If that is your case, there are a couple posts about similar issues in R here and here. You removed "seurat", "ggplot2", "plotly" but "dplyr" is also used in "sctransform", "tidyr" and "infercnv", so you would need to unload those as well, and any other packages that use them, which can get a bit tedious. If you restart R, none of them should be loaded so it should be easier. If however they already are loaded when you restart your R session, you might have an .RData file in either your working directory or your home directory that gets reloaded with R and brings these issues back every time. Simply deleting it then restarting R should solve that.

Regards, Christophe.

krigia commented 1 year ago

@GeorgescuC
Thank you- I fixed it. However, infercnv persists giving errors:

Computing nearest neighbor graph Computing SNN Error in m[match(oldnodes, m)] <- 1:(N - 1) :   NAs are not allowed in subscripted assignments 6: (function ()    traceback(2))() 5: as.hclust.phylo(tmp_full_phylo) 4: as.hclust(tmp_full_phylo) 3: .single_tumor_leiden_subclustering(tumor_group = tumor_group,        tumor_group_idx = tumor_group_idx, tumor_expr_data = tumor_expr_data,        chrs = chrs, k_nn = k_nn, leiden_resolution = leiden_resolution,        leiden_method = leiden_method, leiden_function = leiden_function,        hclust_method = hclust_method) 2: define_signif_tumor_subclusters(infercnv_obj = infercnv_obj,        p_val = tumor_subcluster_pval, k_nn = k_nn, leiden_resolution = leiden_resolution,        leiden_method = leiden_method, leiden_function = leiden_function,        hclust_method = hclust_method, cluster_by_groups = cluster_by_groups,        partition_method = tumor_subcluster_partition_method, per_chr_hmm_subclusters = per_chr_hmm_subclusters,        z_score_filter = z_score_filter) 1: infercnv::run(infercnv_obj_dong, cutoff = 0.1, out_dir = out_dir,        cluster_by_groups = TRUE, plot_steps = FALSE, denoise = TRUE,        HMM = FALSE, no_prelim_plot = TRUE, png_res = 60)

Your input is much appreciated. Thank you

GeorgescuC commented 1 year ago

Hi @krigia ,

Is there anything particular about the group of cells annotated "Tumor cell" compared to the others? Would you be able to privately share the data so I can look into the issue?

Regards, Christophe.

GeorgescuC commented 1 year ago

After looking into the issue with the data provided, the problem was due to "as.hclust(phylo)" conversion from ape not handling cases where the number of branches triggers scientific annotations. Workaround: run options(scipen = 100) in R before infercnv.