brittneybrinsfield / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

type information of single-character string tags gets lost #40

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
When adding a new tag to an AlignedRead, type information of already existing 
tags is lost. In my case, if I have MD:Z:9 in a record and add a new tag, then 
the new tag is written correctly, but the existing tag gets changed to MD:A:9.

To add the tag, I used
read.tags += [ ('XX': value) ]

===============
The following SAM file and script show what the problem is.

Input SAM file:
@HD VN:0.1.2
@SQ SN:chr LN:900
r0 16 chr 29 0 2M8I1M1I6M * 0 0 AGGCTGGTGTTAGGGTTT * NM:i:10 MD:Z:9

This script copies the input file to out.sam and adds a custom tag:
import pysam, sys
infile = pysam.Samfile(sys.argv[1], "r")
outfile = pysam.Samfile("out.sam", "w")
for read in infile:
  read.tags += [ ('XX', 19) ]
  outfile.write(read)
infile.close(); outfile.close()

Output SAM file:
r0 16 chr 29 0 2M8I1M1I6M * 0 0 AGGCTGGTGTTAGGGTTT * NM:i:10 MD:Z:9

Original issue reported on code.google.com by marcel.m...@tu-dortmund.de on 30 Aug 2010 at 7:02

GoogleCodeExporter commented 9 years ago
Thanks for submitting this.

The conversion happens because in the conversion samtools - python - samtools
the actual type information is lost. "9" is converted from Z (string) to A 
(character)
because it is a string of length 1.

Thanks,
Andreas

Original comment by andreas....@gmail.com on 10 Sep 2010 at 1:13