Open Yurany2908 opened 1 year ago
Hi @Yurany2908 ,
Which version of infercnv are you running? The subclustering has gone through iterations and some changes to the default settings, so the quality can vary depending on the version you are using. If you look at the file names in the output folder, you should have the subclustering method used in the names of some of the files, which should be either random_trees or leiden.
In particular, the newest version available through Github and with the next Bioconductor release (scheduled for next week) has improved Leiden clustering, with more options available for tuning it. One of the difficulties with subclustering is that the optimal settings can vary based on the size of the dataset and its diversity, so some iteration with different settings can be useful (for that, set the up_to_step=15
option so you do not run all the steps past the subclustering until you are satisfied with its quality). To clearly visualize the subclustering done at step 15 without having to continue the run, you can run this snippet:
infercnv_obj_step15 = readRDS("15_tumor_subclustersHMMi6.leiden.infercnv_obj")
subcluster_obj = infercnv_obj_step15
subcluster_obj@reference_grouped_cell_indices = list()
for (grp in names(infercnv_obj_step15@reference_grouped_cell_indices)) {
for (grp2 in names(infercnv_obj_step15@tumor_subclusters$subclusters[[grp]])) {
subcluster_obj@reference_grouped_cell_indices[[grp2]] = infercnv_obj_step15@tumor_subclusters$subclusters[[grp]][[grp2]]
}
}
subcluster_obj@observation_grouped_cell_indices = infercnv_obj_step15@tumor_subclusters$subclusters$all_observations
subcluster_obj@tumor_subclusters = NULL
plot_cnv(subcluster_obj,
cluster_by_groups=TRUE,
output_filename="leiden_test",
out_dir=out_dir,
write_expr_matrix=FALSE)
For some detail on the leiden_function
and leiden_resolution
settings, you can check the igraph documentation for them directly..
The other options that affect the Leiden subclustering are leiden_method
which by default runs a PCA first, but can be set not to ("simple"), and the z_score_filter
that calculate a z-score for genes based on the reference to ignore genes with too much variation in the references (such as MHC genes) in the subclustering.
One thing I notice is that the plot shows 20 ref groups while the code you wrote has the option set to 10, are those the settings for this run?
Regards, Christophe.
Dear all,
I have run inferCNV individually on multiple tumor samples (~1 to 15K cells). As reference, I used a mixture of healthy plasma cells from different samples. I have run each using the following script:
infercnv_obj = CreateInfercnvObject(raw_counts_matrix = seurat_oj[["RNA"]]@counts, annotations_file="annotation_file.tab", delim="\t", gene_order_file="gene_order_file_b.tab", ref_group_names=c("C2", "C3", "C4"))
infercnv_obj = infercnv::run(infercnv_obj, cutoff=0.1, out_dir="output", num_ref_groups=10, denoise=TRUE, HMM=TRUE, analysis_mode = "subclusters")
I am sending the final plot and the 19.HMM prediction plot of a representative case. In this, C1 represents the healthy population in the sample and 57 is the malignant population.
I would like to evaluate intra-tumor heterogeneity based on the inferred CNVs detected in different subclusters detected in each patient sample. However, I am a little confused about the output that I need to used to define the number/identity of the subclusters and the cell IDs for each one. I found a few post post about this, but the information is not very clear to me. Two ideas came up fromt he reading and undertanding the inferCNV output:
If you could help me to define the best way to identify the number and the identity of the subclusters in each sample, it would be extremely helpful.
Many thanks in advance for your help!
Regards,
Yurany