broadinstitute / infercnv

Inferring CNV from Single-Cell RNA-Seq
Other
561 stars 166 forks source link

inferCNV reference cells for Integrated Seurat Object & heatmap interpretation #468

Open garrett-lam opened 2 years ago

garrett-lam commented 2 years ago

Hi,

I would like to run inferCNV on my integrated Seurat object that has both my healthy liver sample and a cancerous liver sample. I am able to split the integrated Seurat object back into the individual Seurat objects (healthy and cancer). However, I am not sure what to set the ref_group_names parameter to when running CreateInfercnvObject(). I read that the ref_groupnames should be a vector and should "be set to various normal-cell types as defined in the sample annotation file"_.

I want to set the reference cells to be the cells in the Healthy Seurat object. How can I go about doing this via the sample annotation file?

Moreover, I would like to know how the inferCNV heatmaps can be interpreted to confirm cancer cells from my scRNA-seq data?

GeorgescuC commented 2 years ago

Hi @garrett-lam ,

Since you already have the Seurat objects in R, rather than using an intermediate file, you can directly provide infercnv with a matrix or dataframe of 2 columns where the first column contains the cell names and the second column the annotation/cell type. Something like this should be enough:


obs_annots = data.frame(v1=cancer_seurat_cell_names_list, v2=rep("cancer", cancer_cell_count)
all_annots = rbind(ref_annots, obs_annots)`

You can then simply provide `annotations_file=all_annots` to the CreateInfercnvObject method along with the rest of the arguments.
You can also write the data to file and specify the file as input if you prefer.

Regards,
Christophe.
garrett-lam commented 2 years ago

Hi @GeorgescuC,

I split my integrated Seurat and ran inferCNV on the counts from the cancerous liver sample. In my sample, there are 21 cell groups: 2 of which I believed could be cancer cell groups (labeled cancer and M_MDSC). Therefore, I created an inferCNV object using CreateInfercnvObject(), and input a vector that contained the names of the 19 other cell groups as the parameter for ref_group_names ("normal" cells)

This is the vector: c("CD4_Tcell", "Bcells", "Th2helper", "G_MDSC", "M1_inflammatory", "CD8_Central_memory", "M2_non-inflammatory", "NKT_cells", "NK", "Monocytes", "Tregs", "cDCs", "Cytotoxic_CD8_Tcells", "pDCs", "Stellate_cells", "Granulocytes", "DN_Tcells", "Cholangiocytes", "Hepatocytes")

InferCNV successfully ran and I got this as my heatmap. However, I am not sure whether the heatmap looks fine. Does this heatmap look okay?

Are my reference cells supposed to be that vector? If not, what should my reference cells be?

infercnv NormalProbabilities PreFiltering

infercnv NormalProbabilities PostFiltering

GeorgescuC commented 1 year ago

Hi @garrett-lam ,

When you provide a set of cells as reference, the effect will be that other non-reference cells with similar expression patterns will appear normal, so having more cells as references is usually not an issue as long as they are properly clustered. There is however one risk when using too many different cells types in the same run, and that is that the initial filtering of genes that are expressed above the threshold is not optimal in all regions. For example, if genes from a region are expressed in cell type A but not in all the others, the average expression we filter on will be much lower, and some of the expressed genes can get filtered out. On the opposite, regions highly expressed in some of the cells types might lead to genes not expressed in other cells types to be kept through the filtering and dilute the signal when smoothing the expression. There is no single solution to this as it depends on multiple parameters such as how diverse the cell types are (in expression profiles), how many different cells types are used, their relative mix, and the sequencing depth. What we can however usually recommend is to limit the analysis to similar cell types, so if you know what type(s) the potential cancer cells are derived from, you can use the matching healthy cells as reference.

Assuming the above mentioned point is not an issue, the heatmap appears to show results that match your expectations. I would however also look at the residual expression heatmap as that is an important control to verify the HMM was ran with appropriate settings. Since from the heatmap it looks like your run had the HMM run per sample/cell type, there may be diversity/sub-clonality that is missed in the HMM results.

Best, Christophe.