BioJulia / XAM.jl

Parse and process SAM and BAM formatted files
MIT License
27 stars 13 forks source link

Position of unmapped mate #40

Open CiaranOMara opened 3 years ago

CiaranOMara commented 3 years ago

Hi @CiaranOMara, I encountered this issue as well and I put together a BAM file that has two reads with the same position (one is the other's unmapped mate) for reproducing the issue. I find that the eof check fix (e.g. with feature/CodecBGZF-issue-31 branch) will actually skip one of the reads. See https://gist.github.com/Marlin-Na/9cb7767dfc87b007500a6fff4a8f5b9a

Originally posted by @Marlin-Na in https://github.com/BioJulia/XAM.jl/issues/31#issuecomment-861296268

I think my issue was that if a read is unmapped, its rightposition(record) will become position(record) - 1, which is >smaller than the leftposition. I think the check here https://github.com/BioJulia/XAM.jl/blob/develop/src/bam/overlap.jl#L87 >should account for this special case. Probably change it to the following? It worked for me.

   if rid < interval[1] || (rid == interval[1] && rightposition(record) < first(interval[2]) &&  position(record) < first(interval[2]))

and

   if rid > interval[1] || (rid == interval[1] && position(record) > last(interval[2]) && rightposition(record) > last(interval[2]))

Originally posted by @Marlin-Na in https://github.com/BioJulia/XAM.jl/issues/31#issuecomment-861388135