Bioconductor / Rsamtools

Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import
https://bioconductor.org/packages/Rsamtools
Other
27 stars 27 forks source link

handle multiple calls to scanBam() on an open BamFile #6

Open mtmorgan opened 5 years ago

mtmorgan commented 5 years ago

https://github.com/LTLA/csaw/pull/4#issuecomment-480141778 and https://support.bioconductor.org/p/119631

> fl = system.file(package="Rsamtools", "extdata", "ex1.bam")
> header = scanBamHeader(fl)[[1]]$targets
> bf = open(BamFile(fl))
> which = GRanges(paste0(names(header), ":", 1, "-", header))
> length(scanBam(bf, param = ScanBamParam(what = "pos", which = which[1]))[[1]]$pos)
[1] 1501
> length(scanBam(bf, param = ScanBamParam(what = "pos", which = which[2]))[[1]]$pos)
[1] 0
jmacdon commented 5 years ago

Iterating through a bam file using a GRanges works for readGAlignments, but not readGAlignmentPairs

> b <- BamFile("aligned_nodups_20190130_acomys/1-D-1Aligned.sortedByCoord.out.bam", yieldSize = 1L, asMates = TRUE)
> open(b)
> repeat{
+ aln <- readGAlignmentPairs(b, param = param)
+ if(length(aln) == 0L)
+ break
+ print(head(table(seqnames(aln)), 2))
+ }

 LAS1  LAS2 
64280     0 
Warning message:
In .make_GAlignmentPairs_from_GAlignments(gal, strandMode = strandMode,  :
    26 alignments with ambiguous pairing were dumped.
    Use 'getDumpedAlignments()' to retrieve them from the dump environment.
LTLA commented 5 years ago

Do we have a time-frame for this @mtmorgan? I can merge LTLA/csaw#4 as it is now, but performance in @jmacdon's use case is going to be pretty bad...