brittneybrinsfield / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

problem with cigarstring attribute #114

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
I just installed pysam 7.4 in order to get the useful 
AlignedReadObj.cigarstring functionality.

unfortunately, I'm getting strange cigar strings:

This is the cigar in the BAM record: 30M1D37M966N33M 

read_obj.cigarstring gives this:
M30D1M37N966M33

read_obj.cigar looks correct:
[(0, 30), (2, 1), (0, 37), (3, 966), (0, 33)]

Original issue reported on code.google.com by kreitzma...@gmail.com on 22 Feb 2013 at 11:18

GoogleCodeExporter commented 9 years ago
I agree, the reversed operation/length is surprising and confusing.

I'm just guessing here:  the "cigarstring" property is probably being built 
from the "cigar" property, which has tuples of (operation, length).

I think the solution that creates the most consistency with the SAM spec is to 
change the "cigar" property to return tuples of (length, operation)

(Of course, IMHO, the spec is the one who got it backwards, but this library 
should probably shoot for consistency with the spec)

Original comment by bucha...@gmail.com on 27 Feb 2013 at 8:52

GoogleCodeExporter commented 9 years ago
Thanks for pointing this out!

I had the different convention for cigar strings in my head when I wrote this 
from exonerate, where CIGAR is indeed (operation, length), and not (length, 
operation).

For reasons of backwards compatibility, I do not want to change .cigar, but
instead changed the newer .cigarstring so that the string representation is 
consistent with the sam format.

Best wishes,
Andreas

Original comment by andreas....@gmail.com on 26 Jun 2013 at 8:43