FelixKrueger / SNPsplit

Allele-specific alignment sorting
http://felixkrueger.github.io/SNPsplit/
GNU General Public License v3.0
52 stars 20 forks source link

MD:A tag #18

Closed caleblareau closed 6 years ago

caleblareau commented 6 years ago

Hi @FelixKrueger,

Great tool; quick question-- I get some reads with a MD:A tag (see attached .bam file), which causes SNPsplit to fail. My understanding though is that one could still use this tag as it specifies a character instead of a string: https://samtools.github.io/hts-specs/SAMv1.pdf

Any thoughts to change the regex to include MD:A: tags? https://github.com/FelixKrueger/SNPsplit/blob/7bed7b64877986d9de599244a27c87a3d66a146e/SNPsplit#L236

On this note, I think that a more-graceful bow-out would be just leaving reads without an MD:Z tag as unassigned rather than killing the whole script. I don't know if you have thoughts on this?

Thanks, -Caleb

caleblareau commented 6 years ago

Sample .bam below

caleblareau commented 6 years ago

Difference between MD:A and MD:Z on page 7 of the samtools pdf

md
FelixKrueger commented 6 years ago

Hi Caleb,

That sounds like a reasonable idea, I can probably take a look at this next week. Which aligner are you using, out of interest?

I am afraid the test file you linked comes up with videos of ladies who don't seem to be appropriately dressed for this time of year, and lots of "access blocked" warnings. I could provide you with a link to an FTP site to upload the file if that would help? In addition, I would also need a SNP file which you are using for the the SNPsplit process.

caleblareau commented 6 years ago

Yeesh, sorry about the link... I didn't see any suspicious behavior from my end; that's quite embarrassing.

Here's a google drive link: https://drive.google.com/file/d/1pDl6seq6k_bIWyzKNVGz5r866FUWr5md/view?usp=sharing

I'm using bowtie2 with a -X 2000 flag, which I think is the only reason that I can align some small fragments (8, 9 bp). These come from the fact that I'm parsing out some scATAC-seq data where one read is heavily barcoded, and after trimming it, can be quite small. However, the paired read is still 36 bp, so I suspect bowtie2 is able to figure out how to align these very short reads when they arise. Note, though, these are exceptions and not the rules.

Thanks again!

FelixKrueger commented 6 years ago

Hi Caleb,

That link worked much better.... I managed to download the BAM file (you didn't supply a SNP file though) and had a quick look. There was indeed one read which had an MD:A tag but no MD:Z: tag. I also noticed that you used Bowtie2 version 2.2.2 which is nearly 4 years old! Would you mind upgrading Bowtie2 to see if the problem still exists (which I doubt it does). I would like to avoid chasing red herrings that might have been fixed on the Bowtie2 side years ago. Cheers, Felix

FelixKrueger commented 6 years ago

Is there any update on this?

FelixKrueger commented 6 years ago

Assuming this has been fixed.