KrasnitzLab / CNVMetrics

R Package to compare copy number variant (CNV) results from multiple samples/methods
https://krasnitzlab.github.io/CNVMetrics/
4 stars 2 forks source link

Extracting dendrogram #62

Open gevro opened 1 month ago

gevro commented 1 month ago

Hi, How can I extract the dendrogram from the plotMetrics function result? Thanks

adeschen commented 1 month ago

Hi @gevro,

The output of the plotMetric() function is a grob object that is aready transformed into a graphical object. Not all information can be extracted from it. The order of the samples in the graph can be extracted but not the dendrogram itself (only the coordinates to generate it graphically).

library(CNVMetrics)

data.dir <- system.file("extdata", package="CNVMetrics")
cnv.file <- file.path(data.dir, "mousePairedOrganoids.txt")
calls <- read.table(cnv.file, header=TRUE, sep="\t")

metricLog <- calculateLog2ratioMetric(segmentData=grlog, 
                                       method="weightedEuclideanDistance", nJobs=1)

gOut <- plotMetric(metricLog)

gOut$grobs[[1]]$grobs[[1]]$grob[[5]]$label

At the moment, the easiest way to have an easy access to the dendrogram is by generating your own heatmap using pheatmap or ComplexHeatmap.

library(pheatmap)
metricMat <- metricLog$LOG2RATIO
diag(metricMat) <- 1.0 
metricMat[lower.tri(metricMat) & is.na(metricMat)] <- 0.0
metricMat[upper.tri(metricMat)] <- t(metricMat)[upper.tri(metricMat)]
metricMat[is.na(metricMat)] <- 0.0
metricDist <- as.dist(1-metricMat)

hc <- hclust(metricDist , method = "ward")
plot(hc)
pheatmap(metricDist, cluster_rows = hc, cluster_cols = hc)

Best Regards, Astrid

gevro commented 1 month ago

Thanks! Also, I'm curious if there's a way for the calculate metrics function that uses only amplification or deletion information to calculate a distance matrix that integrates both amplifications and deletions? Currently, it seems to do each one separately. I prefer amplification/deletion metrics because the log2 ratio metric is too sensitive in the context of high variability of single cell data.

adeschen commented 1 month ago

Hi @gevro ,

You would like to calculate how much events are shared between samples (metric) without taking into amplification/deletion status but just event present status. Am I understanding the request correctly?

We originally planned to have a more generic function accepting terms defined by users (ex: LOH) but we did not implement it.

At the moment, the workaround would be to give to all events, used as input, the same status (ex: AMPLIFICATION even for deletions) and run the function with that input.

It is the first time I hear that the metrics are being used on single-cell data. I wish you the best. Don't hesitate if you have other questions.

Best, Astrid

gevro commented 1 month ago

Thanks. What I meant was to take the information of clustering amplifications only, and information from clustering deletions only and somehow integrating it so that the final clustering uses both types of information. Is that possible?

adeschen commented 1 month ago

I will need to discuss this with my colleagues. I have seen cases where keeping AMP and DEL separated is the best approach. For example, when comparing bulk tissue to derived cultures, we can see high similarity in deletion patterns but total lost of the amplification in the cultures.

gevro commented 1 month ago

Makes sense -thanks!