MaciekAber / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

Not all 1.4 fields are known to pysam #108

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. open a file with a "PG" field in the "RG" tag 
2. open a file with a "PP" field in the "PG" tag

What is the expected output? What do you see instead?
An open file, but you get:
ValueError: unknown field code 'PG' in record 'RG'

What version of the product are you using? On what operating system?
version 0.6, but is present in 0.7 as well on python 2.6

Please provide any additional information below.
Not all fields available within the 1.4 sam specification are known within 
pysam. The PG field within the RG tag is used to describe which program is used 
to create the read group, therefore it is different from the PG tag that 
describes the programs used to create this BAM (and not only one of the read 
groups)

All fields are available when you replace the header description section within 
the pysam/csamtools.pyx sourcecode:
# valid types for sam headers
VALID_HEADER_TYPES = { "HD" : dict,
                       "SQ" : list,
                       "RG" : list,
                       "PG" : list,
                       "CO" : list }

# order of records within sam headers
VALID_HEADERS = ("HD", "SQ", "RG", "PG", "CO" )

# type conversions within sam header records
VALID_HEADER_FIELDS = { "HD" : { "VN" : str, "SO" : str, "GO" : str },
                        "SQ" : { "SN" : str, "LN" : int, "AS" : str, "M5" : str, "UR" : str, "SP" : str },
                        "RG" : { "ID" : str, "SM" : str, "LB" : str, "DS" : str, "PU" : str, "PI" : str, 
                                 "CN" : str, "DT" : str, "PL" : str, "FO" : str, "KS" : str, "PG" : str },
                        "PG" : { "PN" : str, "ID" : str, "VN" : str, "CL" : str, "PP" : str }, }

# output order of fields within records
VALID_HEADER_ORDER = { "HD" : ( "VN", "SO", "GO" ),
                       "SQ" : ( "SN", "LN", "AS", "M5" , "UR" , "SP" ),
                       "RG" : ( "ID", "SM", "LB", "DS" , "PU" , "PI" , "CN" , "DT", "PL", "FO", "KS", "PG" ),
                       "PG" : ( "PN", "ID", "VN", "CL", "PP" ), }

Original issue reported on code.google.com by bratdak...@gmail.com on 21 Dec 2012 at 9:52

GoogleCodeExporter commented 8 years ago
Thanks for telling us, fixed!

Original comment by andreas....@gmail.com on 14 Jan 2013 at 10:26