Extracting/Generating overall CNV scores for each cell and plotting onto UMAP

Gibbatron commented 2 years ago

Hi,

I have seen this sort of question come up on a few threads but none seem to truly answer this question. Apologies if this has indeed been answered.

I've run the subcluster analysis (in RStudio) on my sample and have used add_to_seurat() to transfer the data over onto my seurat object, but I notice that this gives me the CNV status of every chromosome rather than an overall CNV status of the cell. I also notice that the cnv object produces subclusters, how would I plot these visually?

I've found a tutorial using python (https://icbi-lab.github.io/infercnvpy/tutorials/tutorial_3k.html) which uses cnv.tl.cnv_score() and this "computes a summary score that quantifies the amount of copy number variation per cluster. It is simply defined as the mean of the absolute values of the CNV matrix for each cluster". Does this mean that they have taken the clusters generated by the subcluster analysis and then generated a CNV score on that? If so, how can this be done in RStudio? They have also plotted the CNV score on top of their transcriptomics-based UMAP and then labelled cells as 'tumor' or 'normal'.

Ultimately what I want to do is take my UMAP and overlay an overall CNV score for each cell and use that to predict which cells/clusters in the UMAP are 'cancer' cells - how do I do this? Is there a tutorial/post that has the script/coding to be able to do this?

Thanks so much in advance!

Alex

GeorgescuC commented 2 years ago

Hi @Gibbatron ,

For the overall CNV status of the cell, we don't calculate a value for it, you could calculate it either from the residual expression values, or from the HMM results as exported per chromosome (using either weighted scores or not). Infercnvpy is not developped or maintained by us, so I am not sure what their method does, but from the description, it seems to be mean(abs(cells_in_subcluster)) based on the residual expression, rather than the HMM predictions (which is what add_to_seurat uses). You can access the subcluster information in the infercnv_obj@tumor_subclusters$subclusters slot and use those to subselect the residual expression and calculate the score. Something like this should do what you are looking for (assuming the order of cells is the same in the seurat_obj and infercnv_obj matrices):

seurat_obj@meta.data[["cnv_score"]] = vector(mode="double", length=ncol(infercnv_obj@expr.data))
seurat_obj@meta.data[["subcluster_identity"]] = vector(mode="double", length=ncol(infercnv_obj@expr.data))
i = 1
for (subclust in infercnv_obj@tumor_subclusters$subclusters) {
    seurat_obj@meta.data[["cnv_score"]][unlist(subclust)] =  mean(abs(infercnv_obj@expr.data[ , unlist(subclust), drop=FALSE]))
    seurat_obj@meta.data[["subcluster_identity"]][unlist(subclust)] = rep(i, length(unlist(subclust)))
    i = i + 1
}

If both matrices are not sorted in the same order, you will need to sort the values with this as well: cell_ordering_match = match(colnames(seurat_obj@assays[[assay_name]]), colnames(infercnv_obj@expr.data))

We don't currently export the subcluster information with add_to_seurat, but that is something we could do. The help for how to plot currently exported values can be found on the wiki

Regards, Christophe.

jgarces02 commented 1 year ago

We don't currently export the subcluster information with add_to_seurat, but that is something we could do.

Would be nice!

broadinstitute / infercnv

Extracting/Generating overall CNV scores for each cell and plotting onto UMAP #434