Plot_cnv for each individual after inferring CNV using integrated count matrix

mlin2017 commented 3 years ago

Hi,

Thank you for the great tool!

I ran infercnv successfully on an integrated count matrix for multiple individuals (integrated using Seurat). The final CNV plot (including cells of all samples) looks pretty convincing. My first question is that: is this the right way to infer CNV for multiple samples' single cells? Or should I run infercnv for each individual separately?

I am also thinking about plotting separate CNV plots for each sample ( if what I have done is correct) to visualize the differences among samples. I found _plot_pergroup() can "Takes an infercnv object and subdivides it into one object per group of cells to allow plotting of each group on a seperate plot. ", but I think the group here means cell type not sample identity. Is there any similar function that can subdivide and plot by samples? Could you give me some guidance on this?

Thanks, Mao

GeorgescuC commented 3 years ago

Hi @mlin2017 ,

You can definitly run infercnv with multiple samples at once as long as they are of the same data type. The important part is what cells are used as the reference. As long as the reference is consistant and representative of cell types, the results should be very similar whether you run them per sample or together. Some slight variation at the exact residual expression values will still happen because if you use more/less cells the normalization will vary somewhat, and the basic filtering of very low/not expressed genes can flip if the average over all cells passes the threshold. The case where it could be useful to split up runs is if you have a large range of different cells types and want to avoid flitering out too many genes expressed only in the less common types.

The "group" refers to the names provided in the input annotation file, which are also used to define the set of cells to use as the reference. There is no other method to split the plots, however if your annotations are cell types and you want to plot per sample, you can make use of this one by doing the following: 1) Create a 2nd, new, infercnv object with CreateInfercnvObject() using the exact same input matrix and gene table, but change the annotation file to have the sample information rather than the cell types for the non reference cells. Let's call this object infercnv_obj_samples. Do NOT run infercnv using it. 2) Load the infercnv results from your 1st run using: infercnv_obj_results = readRDS("/path/to/output/run.final.infercnv_obj") 3) Swap the grouping information with: infercnv_obj_results@observation_grouped_cell_indices = infercnv_obj_samples@observation_grouped_cell_indices 4) Reset the stored hclust/clustering information so that it does not generate errors because of mismatching annotations by running: infercnv_obj_results@tumor_subclusters = NULL 5) Call the plotting method on this edited object with the additional options you want: plot_per_group(infercnv_obj_results, OPTIONS)

If you have questions or encounter issues with this process let me know.

Regards, Christophe.

mlin2017 commented 3 years ago

Hi Christophe,

Thank you for your suggestions. I like the idea of regenerating the annotation file with sample identity rather than cell types. Similarly, it would also be helpful to combine sample information and cell type annotation for the annotation file, and then run plot _per_group where 'group' could be sample identity or sample+specificCellType.

Thanks, Mao

hchintalapudi commented 3 years ago

Hi, I saw this thread and tried plotting the obs groups per "Sample" rather than by "celltype". Followed all the steps and still getting an error:

infercnv_obj_samples<- CreateInfercnvObject(raw_counts_matrix=exp.rawdata,
                                    annotations_file="/Users/hchintalapudi/Desktop/work/ScRNA/10x_GEX_10-6-20/InferCNV/annotations_integrated_sample.txt",
                                    delim="\t",
                                    gene_order_file="/Users/hchintalapudi/Desktop/work/ScRNA/10x_GEX_10-6-20/InferCNV/GRCz11.101_gene_pos.txt",
                                    ref_group_names = c("Macro","T/NK-T","7","Fibro", "Dendritic cells","11","Ductal", "14", "15", "12"))
infercnv_obj_results = readRDS("/Users/hchintalapudi/Desktop/work/ScRNA/10x_GEX_10-6-20/InferCNV/run3 (mode=subclusters)/run.final.infercnv_obj")
#Swap the grouping information with:
infercnv_obj_results@observation_grouped_cell_indices = infercnv_obj_samples@observation_grouped_cell_indices

infercnv_obj_results@tumor_subclusters = NULL

plot_per_group(infercnv_obj_results, 
+                out_dir = "/Users/hchintalapudi/Desktop/work/ScRNA/10x_GEX_10-6-20/InferCNV/run3 (mode=subclusters)/plot_per_group(Sample)/",
+                base_filename = "infercnv_per_Sample")
INFO [2021-09-01 19:25:10] ::plot_cnv:Start
INFO [2021-09-01 19:25:10] ::plot_cnv:Current data dimensions (r,c)=3537,1403 Total=4970001.87727545 Min=0.690458358493645 Max=1.59016190596341.
INFO [2021-09-01 19:25:10] ::plot_cnv:Depending on the size of the matrix this may take a moment.
INFO [2021-09-01 19:25:14] plot_cnv_observation:Start
INFO [2021-09-01 19:25:14] Observation data size: Cells= 1403 Genes= 3537
ERROR [2021-09-01 19:25:14] Unexpected error, should not happen.
Error in .plot_cnv_observations(infercnv_obj = infercnv_obj, obs_data = obs_data,  : 
  Error

Looks like it's because of this infercnv_obj_results@tumor_subclusters = NULL? Any tips on how I can solve this, @GeorgescuC @mlin2017 ?

Thanks, Christophe for the amazing tool!

Himanshu

GeorgescuC commented 3 years ago

Hi @hchintalapudi ,

I think a change to the main plotting function to make it always use the hclust/subcluster information if existing made the plot_per_group function not work when providing NULL subclustering information. I have pushed a fix on the master branch, so if you update your installation using the github code, it should now work.

Regards, Christophe.

broadinstitute / infercnv

Plot_cnv for each individual after inferring CNV using integrated count matrix #329