ChristopherWilks / megadepth

BigWig and BAM utilities
Other
91 stars 9 forks source link

Error when using --alts on chimeric read alignment #4

Closed ASLeonard closed 3 years ago

ASLeonard commented 3 years ago

Hello, Firstly, this tool has been pretty amazing in terms of speed and usability, so a big thanks for that!

I was trying to use the --alts feature, but have hit a runtime error, specifically

terminate called after throwing an instance of 'std::runtime_error'
  what():  No such CIGAR operation as "5"

There are no MD tags in my bam file, so running with --require_mdz correctly fails with what(): No MD:Z extra field for aligned read "1".

The only major difference I could find for the specific read causing the issue (using --ends), was an additional tag for chimeric alignments

SA:Z:NKLS02001106.1,2705752,+,17814M102I15242S,54,183;

Potentially it is the 5 in 54 after the CIGAR string which represents mapping quality causing this issue?

ChristopherWilks commented 3 years ago

Hi Alex,

Thanks for the bug report and for using Megadepth!

I'll have to dig into this a bit as it's one of the few areas of Megadepth I didn't write myself :)

Would it be possible for you to post a tiny SAM file with a few reads from your dataset (including the one you mentioned) for an example I could run?

Thanks, Chris

ChristopherWilks commented 3 years ago

For the record from my initial look at the code in light of this bug, in terms of the SA tag, we're only ever calling bam_aux_get for the MD tag, so unless the HTSLib reader is doing it automatically, I'm not sure why'd we be parsing the CIGAR string in an optional tag that isn't MD.

ASLeonard commented 3 years ago

I uploaded a small slice of the bam file (had to rename as txt for github, but it is bam). The last line is the one causing the issue. Megadepth works fine with the bam made through samtools view -h test.bam | head -n -1 | samtools view -b > test2.bam, producing the alt tsv.

test.txt

ASLeonard commented 3 years ago

I just tried with a different bam file, and it also threw the same error on the first chimeric read with the SA tag. So not sure what is causing it, but definitely seems tightly related to these types of alignments.

ChristopherWilks commented 3 years ago

Thanks for the test file, that really helped!

So it turns out, it was something really silly (but thankfully simple to fix). Some time ago, there was a refactoring of most of the cigar processing code for the --alts option.

We moved from checking for ops as chars (e.g. H for hardclipping) and started using htslib's names for them (e.g. BAM_CHARD_CLIP). However, in the non-mdz cigar processing function, I had never updated the hardclip and padding to use the correct names and we've never had a BAM that had them apparently.

I'll hold off on making a new release for just this bugfix until you get a chance to test it with your BAMs, the binaries are here: http://snaptron.cs.jhu.edu/data/temp/megadepth/cigarfix/

Please let me know if that works on your BAMs and/or if you run into anything else, Thanks

ASLeonard commented 3 years ago

The binary works great and no errors on either of my test bam files. Thanks for your help!

ChristopherWilks commented 3 years ago

Thanks Alex!

ChristopherWilks commented 3 years ago

FYI I've released this fix as 1.1.0c (latest as of today)