bede / kindel

Indel-aware consensus from aligned BAMs
GNU General Public License v3.0
20 stars 2 forks source link

Review handling of SAM 1.4+ CIGAR ops #3

Open bede opened 7 years ago

bede commented 7 years ago

Review handling of N, H, P, X and = ops. X and = were introduced in 1.4 and remove the need to concurrently examine the MD tag https://samtools.github.io/hts-specs/SAMv1.pdf

mdshw5 commented 6 years ago

Is this something that should be reviewed in mdshw5/simplesam as well?

bede commented 6 years ago

Hi @mdshw5 : ) Does Simplesam have operation-specific functionality? Or does it simply parse CIGAR ops as single char strings at face value? If the latter, it should be fine. BBMap is the only popular aligner implementing the 1.4 spec at this time AFAIK. So long as it handles X and = ops it'll be fine.

mdshw5 commented 6 years ago

No, simplesam only uses the CIGAR operations to inspect the alignment of the sequence, but does not interpret the match/mismatch operators. It seems like the SAM1.4 CIGAR operations don't completely negate the MD tag, only when you don't care about the exact sequence of the reference.

Anyway, I cut a new release (0.1.2) which should be on PyPI in a few minutes.

bede commented 6 years ago

Sorry for not being clearer – yes it only means that the MD is redundant for some applications.

mdshw5 commented 6 years ago

You got me excited, since it really bothers me that we have two strings (CIGAR and MD) that basically represent the same information in a slightly different way.

bede commented 6 years ago

Ah yes. At least we can now count indels and substitutions without the MD this way. Thanks again for making Simplesam.