Bioconductor / Rsamtools

Binary alignment (BAM), FASTA, variant call (BCF), and tabix file import
https://bioconductor.org/packages/Rsamtools
Other
27 stars 27 forks source link

scanBam 'which' returns the same record multiple times #39

Closed gevro closed 2 years ago

gevro commented 2 years ago

Regarding this documented behavior: "When one record overlaps two ranges in which, the record is returned twice."

Is there any way to disable this? This doesn't match what 'samtools view' does, and makes things difficult downstream. I don't see the benefit of this, if one want to simply extract reads that overlap specific regions.

Thanks.

mtmorgan commented 2 years ago

This behavior cannot be disabled; one could set ScanBamParam(what = "qname") and use that to 'de-duplicate' as desired...

> fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
> param = ScanBamParam(what = "qname", which = GRanges(c("seq1:5", "seq1:6")))
> aln = readGAlignments(fl, param = param)
> length(aln)
[1] 7
> length(aln[!duplicated(mcols(aln)$qname)])
[1] 4