broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
566 stars 166 forks source link

Different number of clusters in '17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.cell_groupings' and '17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.pred_cnv_genes.dat' #551

Open parkjooyoung99 opened 1 year ago

parkjooyoung99 commented 1 year ago

Dear developer, I am currently using inferCNV with random-tree and find out there are different number of clusters in '17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.cell_groupings' and '17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.pred_cnv_genes.dat'.

In 17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.cell_groupings, there are 7 clusters and in 17_HMM_predHMMi6.rand_trees.hmm_mode-subclusters.pred_cnv_genes.dat, there are only 6 clusters.

Would there be a reason for decrease in cluster number??

Thank you!

GeorgescuC commented 1 year ago

Hi @parkjooyoung99 ,

The cell_groupings file contains a list of all clusters and which cells are part of them, meaning all cells will appear exactly once, and each cluster as many times as they are cells in it. The pred_cnv_genes.dat file contains a list of all genes part of each identified CNV region and the cluster in which they are found, which means that if a clusters has no CNV regions found, it will not appear in that file. A gene will appear as many times as there are clusters in which it is found to be part of a CNV, and each cluster will appear as many times as there are genes part of CNVs identified in it. If you look at the "state" column, you can notice that the neutral state is never found (state 3 for the i6 model).

Regards, Christophe.