Open slowkow opened 4 years ago
I have a feeling that I might need to try modifying this line from kpPlotBAMCoverage()
:
bam.cov <- bamsignals::bamCoverage(data, karyoplot$plot.region, verbose = FALSE)
Perhaps I should try setting bamCoverage(paired.end = "ignore")
?
I'll update this issue if that seems to work.
HI @slowkow
When I implemented kpPlotBamCoverage
I was working mainly with exome and genome data and that was not an issue. I see your problem and I think it's an important feature to add to the package.
As you have seen, the coverage data comes from bamsignals so the key here will be to get it return the desired values. If you get it to work, I'll be more than happy to accept a pull request or some code :) otherwise I'll add it to the (not so short!) TODO list!
Thanks for reporting!
Bernat
This code seems to give a coverage profile that is more similar to IGV.
library(Rsamtools)
summaryFunction <- function(gr, bamFile, ...) {
param <- ScanBamParam(
what = c("pos", "qwidth"),
which = gr,
flag = scanBamFlag(isUnmappedQuery = FALSE)
)
x <- scanBam(bamFile, ..., param = param)[[1]]
coverage(IRanges(x[["pos"]], width = x[["qwidth"]]))
}
cvg <- summaryFunction(regions[[this_gene]], bam_file)
d_cvg <- as.data.frame(cvg)
d_cvg$x <- rownames(d_cvg)
d_cvg <- d_cvg[d_cvg[,1] > 0,]
colnames(d_cvg) <- c("y", "x")
d_cvg$x <- as.integer(d_cvg$x)
kpArea(
kp,
chr = as.character(seqnames(regions[[this_gene]])),
x = d_cvg$x,
y = d_cvg$y,
ymin = 0,
ymax = max(d_cvg$y),
r0 = 0.25, r1 = 1, col = "grey40"
)
kpAxis(kp, ymin = 0, ymax = max(d_cvg$y), r0 = 0.25, r1 = 1)
Here is the contents of regions[["HLA-A"]]
:
> regions[["HLA-A"]]
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] 6 29941160-29945984 *
-------
seqinfo: 1 sequence from an unspecified genome; no seqlengths
The figure is still not identical to IGV, because IGV reports a range from 0 to 4361 and Rsamtools reports a range from 0 to 6666. My best guess is that IGV implicitly applies some flags to filter reads. It would be nice if we could find where IGV defines those flags.
It seems like karyoploteR is computing coverage in a different way than IGV.
I think that IGV is ignoring the reads that have pairs mapping outside the viewing region. In contrast, karyoploteR is including those reads as if they cover those bases.
I'm only interested to know about the actual sequenced bases supported by actual reads. I'm not interested in the gaps in the read alignments, since the gaps do not tell me anything about how well a particular position is covered.
Could I please ask if you might have any hints about how to get the desired behavior?
Here's what I see from karyoploteR:
Here's what I see in IGV:
Here's my code: