-hspike modeling of normalsToUse throws an error

jachou1 commented 1 year ago

Hello, I'm running inferCNV on 10X ST data wherein a spot represents a cell. I've gotten to step 3, but it throws an error about x not having at least 2 dimensions. Please see screenshot.

I saw a similar question on github on January 4th, but in my case, I do not have a reference group, so my argument for ref_group_names is 'null'. When I look at the @reference_grouped_cell_indices, it is empty. Please kindly advise what could be going wrong. Thank you.

GeorgescuC commented 1 year ago

Hi @jachou1 ,

"@reference_grouped_cell_indices" being empty is fine if you "set ref_group_names" to "null".

Which version of infercnv are you using? One reason why this issue used to happen in older versions was when using a sparse matrix as input. The error message indicates that base::rowMeans() is used rather than Matrix::rowMeans(), with the first one not supporting sparse matrices. The current version of infercnv imports the second so there should be no issues when using sparse matrices.

Regards, Christophe.

jachou1 commented 1 year ago

Hi @GeorgescuC ,

My version of infercnv is ‘1.15.0’. But my count input is not a dCgMatrix, it's actually just a plain document. Would you recommend I upgrade to current version? Thank you.

GeorgescuC commented 1 year ago

Hi @jachou1 ,

There is a more recent version you can update to, but I do not think any of the changes will solve this specific issue. Can you check that the 3629 cells that should all be set as observations are indeed there with length(unlist(infercnv_obj@observation_grouped_cell_indices))?

If they are, could you share the object with me privately so I can look into what is happening and debug the issue?

Regards, Christophe.

jachou1 commented 1 year ago

Yes, that command does return 3629. I have shared the object with you through email. Thank you.

sxz-ivan commented 1 year ago

Hi, Christophe! Here's how I solved the issue in my case.

First, I set the ref_group_names to those having more than 10 cells. No luck. #156 #276
Also, my gene_names in the matrix were consistent with the gene_order_file. #485

Next, I pulled the infercnv package from the repo (which was 1.15.3), and sed every rowMeans and rowSums in R/ and scripts/ into Matrix::rowMeans and Matrix::rowSums before installation, but I still saw base::rowMeans in the error message. So I assumed it came from some other packages infercnv loaded. #519

Finally, the problem was solved after I set the all the cell populations (not only the ref_groups) in the annotation file with less than 10 cells into a combined group (2 cells might just be fine though).

Here's the code:

cells.tab = table(seurat.meta$SingleR.labels) labels_with_enough_cells = names(cells.tab[cells.tab>10]) ref_group_names = c("Macrophages","Monocytes",'CD8+ T-cells','NK cells','T_cells','CD4+ T-cells')%>%intersect(labels_with_enough_cells) seurat.meta = within(seurat.meta, {SingleR.labels=ifelse(!SingleR.labels%in%labels_with_enough_cells,'less_than10cells',SingleR.labels)}) seurat.meta%>%write.table(file = annot, col.names = F,row.names=F,sep='\t',quote=F)

GeorgescuC commented 1 year ago

Hi @sxz-ivan ,

Thank you for sharing what worked in your case.

For reference if others have the same issue, in the case of the original post, the problem was that each cell was defined as a part of its own unique annotation. When doing the hspike modeling, among other things, the per gene mean is calculated on each reference annotation, or in the case where there are no references defined, on all (observation) annotations. If one of the annotations on which means are supposed to be calculated consists of a unique cell, the error "x must be an array of at least 2 dimensions" occurs as the matrix is of "genes x 1" dimensions. A fix should work by having the data be subset with "drop=FALSE" added, but it is also probably good to have those cases error.

Regards, Christophe.

BaluPai commented 1 year ago

Hi, I have been using inferCNV for few weeks now but I am running into the same error now, after changing the way I use references. So far I have been merging my reference data to the patient data and specifying the clusters (by cluster number) since mostly it clustered separately. But with new set of reference few of the clusters merge with the patient clusters. I specify the clusters/cells in the clusters by modifying the metadata to have this info in "cluster_patient" ( e.g., 12_patient or 12_normal for cells in same cluster).

After reading through this issue report, I removed the cells that were very few in one particular group (for e.g., if 12_normal were only 10 or 5). Is it possible that it is not allowed to use selected cells from a particular cluster as reference? Please let me know your thoughts/suggestions. Thanks!

JFanbio commented 7 months ago

Hi, The error also happened to my datasets. I wrote a for loop to run infercnv for each sample with reference as non-tumor cells, while the error just happened for some samples. I don't know why. Thanks!

Hi @sxz-ivan ,

Thank you for sharing what worked in your case.

For reference if others have the same issue, in the case of the original post, the problem was that each cell was defined as a part of its own unique annotation. When doing the hspike modeling, among other things, the per gene mean is calculated on each reference annotation, or in the case where there are no references defined, on all (observation) annotations. If one of the annotations on which means are supposed to be calculated consists of a unique cell, the error "x must be an array of at least 2 dimensions" occurs as the matrix is of "genes x 1" dimensions. A fix should work by having the data be subset with "drop=FALSE" added, but it is also probably good to have those cases error.

Regards, Christophe.

broadinstitute / infercnv

-hspike modeling of normalsToUse throws an error #519