brentp / hts-python

pythonic wrapper for libhts (moved to: https://github.com/quinlan-lab/hts-python)
https://github.com/quinlan-lab/hts-python
MIT License
49 stars 18 forks source link

Issue parsing some tags #7

Closed JohnLonginotto closed 8 years ago

JohnLonginotto commented 8 years ago

Hey Brent :)

hts-python is having some difficulty parsing some tags in a specific BAM file:

>>>> import hts
>>>> in_file = hts.Bam('./actual.bam')
>>>> for read in in_file: print read.tags
[('MC', 'Z', '101M'), ('MD', 'Z', '16T2C80'), ('PG', 'Z', 'bwa-meth'), ('RG', 'Z', '44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed'), ('NM', 'C', 28), ('MQ', 'C', 60), ('UQ', 'S', 40), ('\x04A', 'S', 67), ('WX', 'S', 67)] 
[('MC', 'Z', '7M1D94M'), ('MD', 'Z', '9T2C87'), ('PG', 'Z', 'bwa-meth'), ('RG', 'Z', '44_Mm08_WEAd_Db2_WGBS_E_1_L001__trimmed'), ('NM', 'C', 27), ('MQ', 'C', 25), ('UQ', 'S', 213), ('\x03A', 'S', 67), ('WX', 'S', 67)] 
...

Note the '\x03A' and '\x04a'. A 10-read sample of the full bam can be found here: http://ac.gt/actual.bam

Parsing these tags works in samtools and pysam, so it's a real/reproducible issue. If its any consolation, both pysam and samtools fail to parse another BAM file i have which only contains unmapped reads (no chromosome data, which causes the error), so perhaps i'm just having a bad day ;) If you need more data, or anything else on my end, not a problem! :v:

brentp commented 8 years ago

fixed. thanks for reporting and sorry for the delay!.