Open bede opened 7 years ago
Is this something that should be reviewed in mdshw5/simplesam as well?
Hi @mdshw5 : )
Does Simplesam have operation-specific functionality? Or does it simply parse CIGAR ops as single char strings at face value? If the latter, it should be fine. BBMap is the only popular aligner implementing the 1.4 spec at this time AFAIK. So long as it handles X
and =
ops it'll be fine.
No, simplesam only uses the CIGAR operations to inspect the alignment of the sequence, but does not interpret the match/mismatch operators. It seems like the SAM1.4 CIGAR operations don't completely negate the MD tag, only when you don't care about the exact sequence of the reference.
Anyway, I cut a new release (0.1.2) which should be on PyPI in a few minutes.
Sorry for not being clearer – yes it only means that the MD is redundant for some applications.
You got me excited, since it really bothers me that we have two strings (CIGAR and MD) that basically represent the same information in a slightly different way.
Ah yes. At least we can now count indels and substitutions without the MD this way. Thanks again for making Simplesam.
Review handling of
N
,H
,P
,X
and=
ops.X
and=
were introduced in 1.4 and remove the need to concurrently examine theMD
tag https://samtools.github.io/hts-specs/SAMv1.pdf