dincarnato / RNAFramework

RNA structure probing and post-transcriptional modifications mapping high-throughput data analysis
http://www.rnaframework.com
GNU General Public License v3.0
31 stars 11 forks source link

parsecigar line #1158 in rf-count #58

Closed reamesb closed 3 months ago

reamesb commented 6 months ago

Hi Danny,

We encountered a problem with rf-count as follows:

We were using SAM files with no MD Tag column. However, instead of getting the error: [!] Warning: Missing MD tag encountered in sample "filename". Consider running "samtools calmd" to fix.

we were just getting : Sample "X": 0 transcripts covered.

Replacing the line with $truelen += $n if ($op =~ m/^[MIX=]$/); (adding X= to the things to count) allowed rf-count to compute our read lengths and get actual coverage. Only after this did we get the MD Tag error telling us to add that column (I think, we may have fixed both issues at once, but that's what seems to happen now when I try to reproduce).

I think this may be due to change in CIGAR format? It is tricky because no error message is produced, just no reads get included so we had to dig into the source.

Thanks, Benjamin

dincarnato commented 6 months ago

Hi Benjamin,

would you be able to share a sample BAM file with me? I have literally never encountered a BAM file having in the CIGAR operation X or =. What aligner did you use?

Cheers, Danny

reamesb commented 6 months ago

Hi Danny,

I think the new CIGAR standard allows for this. There is a discussion here: https://lh3.github.io/2018/03/27/the-history-the-cigar-x-operator-and-the-md-tag.

BBMap for instance by default outputs 'X' and '=' symbols in the cigar string. Below I put a small sam file output from running bbmap on some random stuff in the bbmap resources directory.

example.txt

[It seems to have a problem with me uploading a sam file....I just changed the extension to txt]

dincarnato commented 6 months ago

Hi Benjamin,

Not really a new standard. Those cigar operations have been there forever. What I meant is that no standard mapper actually uses them and all of them (STAR, Bowtie, BWA) use the standard "M" operation for both match and mismatch.

Anyway, your change to the code is correct. You will still need to compute the MD tag, but it will work. I will fix this and then you can git pull.

Cheers, Danny

reamesb commented 6 months ago

Thanks Danny.