bernatgel / karyoploteR

karyoploteR - An R/Bioconductor package to plot arbitrary data along the genome
https://bernatgel.github.io/karyoploter_tutorial/
298 stars 42 forks source link

RFE: heatmap-like plot #64

Closed lbeltrame closed 4 years ago

lbeltrame commented 4 years ago

Example of what I mean:

https://clincancerres.aacrjournals.org/content/clincanres/23/9/2223/F1.large.jpg?width=800&height=600&carousel=1

it's useful, IMO, not only for quantitative data, but also if you want to plot discrete data over the genome (yes/no).

bernatgel commented 4 years ago

Hi Luca,

That functionality is already available in karyoploteR :) There is a kpHeatmap function that can do what you want. Unfortunately I have not yet reached the point of having it added to the karyoploteR tutorial and it's only mentioned briefly in the vignette. So it's understandable that you didn't find it.

In any case, I'll propose you another way to achieve what you are looking for: using simple rectangles (kpRect) and combining them with a function automatically computing their color (colByValue, again not yet added to the tutorial) and a function to atutomatically determine the vertical positioning (autotrack).

Since I don't have access to the actual data in the plot you linked, we'll start creating some random data with regioneR::createRandomRegions

library(karyoploteR)

nsamples <- 40

regs <- lapply(1:nsamples, function(i) {
  r <- createRandomRegions(length.mean=5e6, length.sd=5e6, non.overlapping=TRUE)
  r$val <- runif(length(r))
  return(r)
})

After that, we can create our "pseudoheatmap". We'll loop over the samples and for each one we'll plot its regions as rectangles. The color of the rectangle will be determined by its value using

col=colByValue(regs[[i]]$val, colors = c("red", "white", "blue"), min = 0, max = 1)

That function will return a vector of colors depending on the value associated to each rectangle, with 0 beign "red", 0.5 "white" and 1 "blue". You can change the colors and specify any color combination you want. That function uses colorRamp internally, so you can easily find more information on how these colors are selected.

In addition, the vertical position of each sample will be computed with autotrack in

r0=autotrack(i, nsamples)

So the sample will be positioned at position i out of nsamples (in this case, 40). autotrack is reeeeeeally handy, I recommend you to take a look at the autotrack tutorial.

So the complete code would be something like this

kp <- plotKaryotype(plot.type=4)
for(i in 1:nsamples) {
  kpPlotRegions(kp, regs[[i]], 
                col=colByValue(value = regs[[i]]$val, colors = c("red", "white", "blue"), min = 0, max = 1),
                r0=autotrack(i, nsamples))
}

And that will create an image like this one pseudoheatmap

This is a good start and demonstrate how to create heatmaps with karyoploteR, but you can go muuuuuch further than that.

Changing the parameters a bit, adding labels and colored rectangles, chromosome separators, etc... we can change the look and feel of the plot to resemble much more that of the original plot.

pp <- getDefaultPlotParams(4)
pp$bottommargin <- 10
pp$leftmargin <- 0.02
pp$data1inmargin <- 3
pp$topmargin <- 15
kp <- plotKaryotype(plot.type=4, ideogram.plotter = NULL, labels.plotter = NULL, plot.params = pp)
kpDataBackground(kp, col=colByChr(kp$genome, colors = c("#EEEEEE", "#999999")), data.panel = "ideogram")
kpText(kp, chr=kp$chromosomes, x=mid(kp$genome), y=0.5, labels = gsub("chr", "", kp$chromosomes), data.panel = "ideogram")
kpAddChromosomeSeparators(kp)
chrY.len <- kp$chromosome.lengths["chrY"]
for(i in 1:nsamples) {
  at <- autotrack(i, nsamples)
  kpRect(kp, chr="chrY", x0=chrY.len+25e6, x1=chrY.len+50e6, y0=0.1, y1=0.9, r0=at, col=rainbow(nsamples)[i], border=NA)
  kpAbline(kp, h=0, col="lightgray", r0=at)
  kpAddLabels(kp, labels = paste0("S", i), side = "right", r0=at, cex=1.3, label.margin = 0.02)
  kpPlotRegions(kp, regs[[i]], 
                col=colByValue(value = regs[[i]]$val, colors = c("red", "white", "blue"), min = 0, max = 1),
                r0=at)
}

And you will get this image

pseudoheatmap_enhanced

Hope this helps! If you have any other request feel free to ask :)

Oh, and if you are plotting copy number data take a look at our companion package CopyNumberPlots (github repo: https://github.com/bernatgel/CopyNumberPlots) for some handy functions to plot copy-number related data!

lbeltrame commented 4 years ago

Thanks a bunch for the suggestions! I've in fact implemented something similar to that, but using kpHeatmap, although not as nice or as sophisticated as the plots you just provided. ;)


plot_cna_calls = function(sample_data, label.margin=0.01, label.cex=0.8,
    colors=c("blue", "gray60", "red"), ...) {

    require(karyoploteR)

    kp = plotKaryotype(plot.type=5, genome="hg38",
                       chromosomes=c(paste0("chr", 1:22), "chrX"), cex=0.6,...)
    r0 = 0
    r1=1
    for (i in seq_len(length(sample_data))) {
        label = names(sample_data)[i]
        rr <- karyoploteR::autotrack(current.track = i, total.tracks = length(sample_data),
                                     margin = 0.1, r0 = r0, r1 = r1)
        kpHeatmap(kp, sample_data[[i]], colors=colors, r0=rr$r0,
                  r1=rr$r1)
        kpAddLabels(kp, r0 = rr$r0, r1 = rr$r1,
                    labels=label, srt=0, pos=2, cex=label.cex,
                    label.margin=label.margin, data.panel=1)
    }
}

With rectangles it will likely look better (as it's binned copy number, there are some ugly white spots), so I'll incorporate that into what I'm doing now.

bernatgel commented 4 years ago

Hey! that looks pretty good too! glad you found what you needed!

Just one small "typing-reduction" suggestion, for the last few versions it's possible to give r0 a vector with the r0 and r1 values, so you can go from

r0=rr$r0, r1=rr$r1

to

r0=rr

Not a life-saving change, but I find it more readable.

lbeltrame commented 4 years ago

Hmm, doesn't seem to work if I want to use a specific genome (hg38):

kp <- plotKaryotype(plot.type=4, ideogram.plotter = NULL, labels.plotter = NULL, plot_params=plotpar, genome="hg38", chromosomes=c(paste0("chr", 1:22), "chrX"))
WARNING: There was an error when filtering the chromosomes. Falling back to using the unfiltered genome. 
Error: subscript is a NSBS object that is incompatible with the current
  subsetting operation

EDIT: Unless I need to update my BSgenome GenomeInfoDb packages too? I'm running karyoploteR from the latest master.

lbeltrame commented 4 years ago

It looks like a bug. I'll file a separate issue.