bernatgel / karyoploteR

karyoploteR - An R/Bioconductor package to plot arbitrary data along the genome
https://bernatgel.github.io/karyoploter_tutorial/
299 stars 41 forks source link

Bam coverage differences between chromosomes #119

Open diyasen2021 opened 2 years ago

diyasen2021 commented 2 years ago

Hello,

I am trying to plot read coverage using kpPlotBAMDensity but I find large differences between chromosomes using this method. For instance, I run the code below.

custom.genome <- toGRanges("mygenome.txt")
kp <- plotKaryotype(genome = custom.genome, plot.type = 4, cex =0.8, chromosomes = "Pag3770_4")
kp <- kpAddBaseNumbers(kp, tick.dist = 1e6, cex=0.5, units="Mb", tick.len = 3, add.units = "TRUE")
at <- autotrack(current.track = 1, total.tracks = 2)
kpDataBackground(kp, r0=at$r0, r1=at$r1)
kpAddLabels(kp, labels = "3770", r0=at$r0, r1=at$r1, cex=0.5, side="right")
kp<-kpPlotBAMDensity(kp, data="../gatk/3770_merged_coordsort.bam", window.size=1e4, r0=at$r0, r1=at$r1, col="darkorange")
#kpPlotBAMCoverage(kp, data="../gatk/3770_merged_coordsort.bam", max.valid.region.size=11e6, r0=at$r0, r1=at$r1, col="darkorange")
kpAxis(kp, ymax=50000, r0=at$r0, r1=at$r1, cex=0.5)
summary(kp$latest.plot$computed.values$density)

and I obtain the following image for chromosome 4.

Rplot-3770-chrom4

where the mean read density is ~10,000 summary(kp$latest.plot$computed.values$density) Min. 1st Qu. Median Mean 3rd Qu. Max. 38 9770 10064 10128 10358 31758

and by changing to chromosome 10, I obtain the following image

Rplot-3770-chrom10

where the mean read density is ~18,000 summary(kp$latest.plot$computed.values$density) Min. 1st Qu. Median Mean 3rd Qu. Max. 11412 17548 18337 29892 19102 946938

Although chromosome 4 seems okay, the plot for chromosome 10 is dominated by the single peak near the beginning of the chromosome and this seems to be suppressing the read density along the rest of the chromosome, thereby leading to an inaccurate representation. What could possibly be going wrong here? Any help is much appreciated.

Thanks for your time.

Diya

bernatgel commented 2 years ago

Hi Diya,

TL;DR: You should add ymax=50000 to your call to kpPlotBAMDensity. Full explanation below.

I see what your problem is. You have two problems here, actually, a biology/data problem and a karyoploteR one.

The biology/data problem is that you have a veeery high peak in your Pag3770_10 chromosome, about 45X higher than the surrounding regions. If you are using a custom genome (you have assembled or it's an imperfect genome for a non-model organism) and your data is whole-genome DNA-sequencing, your peak most probably arises from an erroneous genome assembly. It could be possible that a bunch of repeats located in different parts of the genome were converted into a single element here in your Pag3770_10 chromosome and you are seeing the sum of the coverage of all those repeats in this single spot. You can sometimes see similar things in the model organisms too near centromeres and telomeres. I don't think there's anything you can do here except make an even better genome draft, if possible, but repetitive regions are a tough nut to crack.

The other problem you have is a karyoploteR problem, a problem with the axis and actually a problem I've seen several times, so it's clear it my fault in designing a and documenting kpAxis and not yours in using it! Anyway, the problem here is that your axis scales are wrong and arbitrary. The thing is that karyoploteR executes the commands one by one, strictly one after the other and independently! So issuing a kpAxis([...], ymax=50000) will not affect in any way kpPlotBAMDensity. If you do not specify ymax in kpPlotBAMDensityit will autoadjust the ymax parameter to the maximum value in the plotted region (31578 for chromosome 4 and 946938 in chromosome 10, in your example). To correctly set your axis according to your plotting, you have two options:

Manually set your axis and data plotting ymax:

kp<-kpPlotBAMDensity(kp, data="../gatk/3770_merged_coordsort.bam", window.size=1e4, ymax=50000, r0=at$r0, r1=at$r1, col="darkorange")
kpAxis(kp, ymax=50000, r0=at$r0, r1=at$r1, cex=0.5)

or set your axis depending on your actual data

kp<-kpPlotBAMDensity(kp, data="../gatk/3770_merged_coordsort.bam", window.size=1e4, r0=at$r0, r1=at$r1, col="darkorange")
kpAxis(kp, ymax=max(kp$latest.plot$computed.values$density), r0=at$r0, r1=at$r1, cex=0.5)

These two approaches have their own problems. For the first one, you need to know the range of your data and you might end up with data out of range (more on that later). In your example, the peak with a density of 900K will be well over the maximum 50K limit.

The second approach will create correct representations also, but with different limits for each plot, making it more difficult to interpret and sometimes leading to incorrect interpretations as in your question.

In your case, since the high peak seems an error in your data, I would probably go for the first option (setting ymax=50000 in both commands). Since clipping is on by default in zoomed plots, the peak should get cropped at 50000 and you'd get what you need. The only problem, is that you'll not have any indication of the cropping, but you could add one with a kpSegments or something similar, plotting a red line over the regions with a density higher than 50000. I think it would make sense to add this directly to kpPlotBAMDensity (and kpPlotBigWig and kpPlotBAMCoverage and...) but I can not provide a timeframe for it to be done.

So... after this long message, I hope this all makes sense and hope it helps you get the images you need. If you need any further clarification, please ask! And if this is not much of a trouble, could you comment later telling me which route you took? It will help me understand how everybody is using karyoploteR and how to improve the package and its documentation!

Bernat

diyasen2021 commented 2 years ago

Hi Bernat,

Thank you very much for the detailed explanation. Your point about the possible assembly error on chrom 10 is noted. I will do some digging to find out what's happening there.

So I tried your suggestions on chromosome 10. I first tried setting ymax to 50000 in kpPlotBAMDensity;

kp<-kpPlotBAMDensity(kp, data="../gatk/3770_merged_coordsort.bam", window.size=1e4, ymax=50000, r0=at$r0, r1=at$r1, col="darkorange")
kpAxis(kp, ymax=50000, r0=at$r0, r1=at$r1, cex=0.5)

that produced the following plot;

Rplot-3770-chr10

Next, I tried option 2;

kp<-kpPlotBAMDensity(kp, data="../gatk/3770_merged_coordsort.bam", window.size=1e4, r0=at$r0, r1=at$r1, col="darkorange")
kpAxis(kp, ymax=max(kp$latest.plot$computed.values$density), r0=at$r0, r1=at$r1, cex=0.5)

resulting in;

Rplot-3770-chr10-autoymax

So of these 2, option 1 seems to have done the trick. Here is chromosome 4 plotted with option 1 code.

Rplot-3770-chr4

I think that chromosomes 4 and 10 are looking more even now. Just wanted to update you and also wanted your thoughts on the matter.

Thanks again for the help!

Diya