broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
557 stars 164 forks source link

Error in base::rowMeans(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions #485

Closed xliu38 closed 1 year ago

xliu38 commented 1 year ago

Hi @GeorgescuC,

Thank you for building the nice tool, and I have been successfully running it on multiple datasets for my project. This error occurred on a subset of my data, as copied below. I have read several posts with this similar error, and it seems that all the reference cells types in my subset have more than one cell (in fact, the B_cell that causes the error has 7 cells).

The script goes like this:


infercnvobj = CreateInfercnvObject(raw_counts_matrix = NB_DBHiCre_counts,

                                     annotations_file = NB_DBHiCre_ann_df,

                                     delim = "\t",

                                     gene_order_file = gene_order,

                                     ref_group_names = c("B_cell", "T_cells", "Macrophage", "Endothelial_cells"))
infercnvres = infercnv::run(infercnvobj,

                              cutoff = 0.1,  # use 1 for smart-seq, 0.1 for 10x-genomics

                              out_dir = paste0("mouse_data/NB_DBHiCre_redo/infercnv_NB_DBHiCre_", name, "_0.3"),  

                              cluster_by_groups = TRUE,   # cluster

                              denoise = TRUE,

                              HMM = TRUE,

                              BayesMaxPNormal = 0.3,

                              num_threads = 32,

                              analysis_mode = "subclusters",

                              resume_mode = T) 

And here is the output and error messgae:

INFO [2022-12-05 10:32:19] ::process_data:Start INFO [2022-12-05 10:32:19] Checking for saved results. INFO [2022-12-05 10:32:19] Trying to reload from step 2 INFO [2022-12-05 10:32:20] Trying to reload from step 1 INFO [2022-12-05 10:32:20] Using backup from step 1 INFO [2022-12-05 10:32:20]

STEP 1: incoming data

INFO [2022-12-05 10:32:20]

STEP 02: Removing lowly expressed genes

INFO [2022-12-05 10:32:20] ::above_min_mean_expr_cutoff:Start INFO [2022-12-05 10:32:20] Removing 117 genes from matrix as below mean expr threshold: 1 INFO [2022-12-05 10:32:20] validating infercnv_obj INFO [2022-12-05 10:32:20] There are 18 genes and 258 cells remaining in the expr matrix. INFO [2022-12-05 10:32:20] no genes removed due to min cells/gene filter INFO [2022-12-05 10:32:20]

STEP 03: normalization by sequencing depth

INFO [2022-12-05 10:32:20] normalizing counts matrix by depth INFO [2022-12-05 10:32:20] Computed total sum normalization factor as median libsize: 112.000000 INFO [2022-12-05 10:32:20] Adding h-spike INFO [2022-12-05 10:32:20] -hspike modeling of B_cell Error in base::rowMeans(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions

These 7 B_cell each has some read counts on the 18 genes (from step 2), makes me wonder what went wrong here.

Thanks in advance! Xueying

xliu38 commented 1 year ago

Hi @GeorgescuC,

To provide more information, I traced down the embedded functions to inferCNV_hidden_spike.R and below is relevant to my question

normal_cells_idx_lists = infercnvobj@reference_grouped_cell_indices
normal_cells_idx = normal_cells_idx_lists[["B_cell"]]
normal_cells_expr = infercnvobj@expr.data[,normal_cells_idx]
gene_means_orig = rowMeans(normal_cells_expr)

Running these lines alone does not generate the error, but calling infercnv does.

Hope this is useful, Xueying

xliu38 commented 1 year ago

Problem solved - the error is actually due to gene names from gene expression file (all upper case) different from annotation file.