broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
557 stars 164 forks source link

Issue with k_obs_groups #613

Open deeKal opened 11 months ago

deeKal commented 11 months ago

Hi @GeorgescuC!

I'm running inferCNV with using "subsclusters" analysis mode and predetermining the k_obs_groups. My command is the following: infercnv_obj = infercnv::run( infercnv_obj = infercnv_obj, cutoff = 0.1, k_obs_groups = 4, cluster_by_groups = FALSE, min_cells_per_gene = 10, out_dir = sample_out_dir, num_threads = 8, analysis_mode = "subclusters", leiden_resolution = 0.001, #The lower it is - the lower the number of subclusters per_chr_hmm_subclusters = TRUE, denoise = TRUE, HMM = TRUE, HMM_type = "i3", #change to i6 for HMM6 BayesMaxPNormal = 0.3, plot_steps = FALSE, no_prelim_plot = TRUE, png_res = 300 )

At first I run it without determining the number of clusters. The number of clusters that my cells were split into was 2 (it was too few, as I could eyeball the 4 different clusters), so I rerun it with k_obs_groups = 4. My issue is that even though the final infercnv plot is with 4 clusters, as it's supposed to be, the subclusters determined after step 15 are only 2. As a result, the HMM predicts CNVs on only the two subclusters. It's worth noting that my plots after steps 17,19 & 20 all have 4 clusters, with the same CNVs in three of them (because after Step 15 the three clusters were one). You can find the plots here. How can I predict CNVs on the 4 clusters? Thank you!

GeorgescuC commented 10 months ago

Hi @deeKal ,

I can see the residual expression plot, but not the HMM figure to compare.

The k_obs_groups option only affects the plotting, not the data, it uses cutree() on the dendrogram on the left side of the heatmap, but does not affect the subclustering, which is what the HMM uses for predictions. The subclustering is mainly controlled by the leiden_resolution parameter, and since you seem to only get 2 subclusters based on your HMM output, I would increase it to get more subclusters. If you are using the latest version of infercnv, after the subclustering step, a plot "infercnv_subclusters.png" should automatically be generated that will show you how your cells are divide by subclusters, so you can quickly evaluate if the subclustering resolution looks adapted. You can quickly iterate over leiden_resolution values by adding the up_to_step=15 option to your run() call so that infercnv stops after the subclustering. Later remove this option when the subclustering look good so that the run can finish.

Of note, I would start without using "per_chr_hmm_subclusters=TRUE". What this option does is that besides running the regular subclustering, it runs an extra round of independent subclustering over each chromosome. These per chromosome subclusters are then used for the HMM predictions on each chromosome. Once this is done, the overall subclusters that were generated are looked at again and a consensus of the potentially different predictions within each chromosome is taken. The reason to use this option is in cases where you have a larger number of subclusters but a lot of them have most of their signal in common, such as if you have clones from different stages of the accumulation of CNVs in your tumor, or "parallel leaves" that share most of their CNVs. This lets the HMM prediction use a smaller set of subclusters by looking at each chromosome independently to have more cells per subcluster and thus higher predictive power.

Regards, Christophe.