broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
557 stars 164 forks source link

hspike modeling in step 3 fails #493

Closed avdl25 closed 1 year ago

avdl25 commented 1 year ago

Hello,

I'm running into an issue with several of my samples in step 3 where the program seems unable to read my annotation file properly:

STEP 03: normalization by sequencing depth

INFO [2022-12-05 21:23:54] normalizing counts matrix by depth INFO [2022-12-05 21:23:58] Computed total sum normalization factor as median libsize: 3228.000000 INFO [2022-12-05 21:23:58] Adding h-spike INFO [2022-12-05 21:23:58] -hspike modeling of T cell Error in base::rowMeans(x, na.rm = na.rm, dims = dims, ...) : 'x' must be an array of at least two dimensions

All of my reference groups have more than 100 cells in them and the gene position file is correct/matches the gene names and formatting of my samples. When I set HMM=False in the infercnv::run step, I am able to run the whole program; however, based on the information in the wiki, it seems like the HMM step is what enables CNV predictions, so I'm unclear on whether this output can be used to identify tumor cells.

I've been able to run other samples from the same cohort successfully and all the files were formatted the same way, so I am at a loss for what to do.

Any suggestions on how to fix this issue? Thanks!

GeorgescuC commented 1 year ago

Hi @avdl25 ,

Which version of infercnv are you using? If you are using a sparse matrix as input and an older version of infercnv, that might be the issue.

Regards, Christophe.

avdl25 commented 1 year ago

Hi @GeorgescuC,

Thank you for the quick response! I'm using inferCNV version 1.12.0 and a txt file as my counts matrix. I'm wondering if a better understanding of what the -hspike modeling step is doing might help me to debug the issue. The error message makes it seem like the program isn't able to read my annotation file properly in order to determine which cells in the counts matrix are annotated as reference.

Thanks again!

GeorgescuC commented 1 year ago

Hi @avdl25 ,

Based on the log, there is at least a reference, but to verify what the infercnv object has marked as references, you can check the contents of infercnv_obj@reference_grouped_cell_indices which contains a list of the reference annotations with the cell indices (in the matrix) that match each of them.

Looking at your log, the median library size looks rather small, so the issue might be the cutoff you used, or a mismatch of genes in the gene order file and your matrix. What type of data are you using, and with what cutoff? The log messages from the object creation and the earlier steps of run() should indicate how many cells/genes are filtered and why, so investigating those may point to an issue.

Regards, Christophe.

chatterjee89 commented 1 year ago

Remove clusters with very few cells in the query dataset(s) and rerun, that usually does it for me.

avdl25 commented 1 year ago

I was able to fix this issue by removing clusters with low cell counts!