brittneybrinsfield / pysam

Automatically exported from code.google.com/p/pysam
0 stars 0 forks source link

Error reading Header for bam files with "PP" field code in record PG found in 1000 genome bam files #110

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1.  Reading the header, header=bamfile.header.copy(), of a recent 1000 genome 
bam file such as 
http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/phase2b_alignment/data/NA070
48/exome_alignment/NA07048.unmapped.ILLUMINA.bwa.CEU.exome.20120522_p2b.bam

The following error is generated:

File "csamtools.pyx", line 1123, in csamtools.Samfile.header.__get__ 
(pysam/csamtools.c:11562)
ValueError: unknown field code 'PP' in record 'PG'

2. Pysam does not recognize the new PP entry as valid in the PG entry of the 
BAM file:
@PG     ID:bam_calculate_bq     PN:samtools     
PP:bam_recalibrate_quality_scores       VN:0.1.17 (r973:277)    CL:samtools 
calmd -Erb $bam_file $reference_fasta > $bq_bam_file
3. The PP tag is  used to indicate the parameters used for the PG Program.

What is the expected output? What do you see instead?

 File "read_hla.py", line 91, in CN
    header=bamfile.header.copy()
  File "csamtools.pyx", line 1123, in csamtools.Samfile.header.__get__ (pysam/csamtools.c:11562)
ValueError: unknown field code 'PP' in record 'PG'

What version of the product are you using? On what operating system? Linux
0.6

Please provide any additional information below.

I believe this section of csamtools.pyx needs to be updated to include the PP 
entry as a valid header field for PG:
csamtools.pyx

# type conversions within sam header records
VALID_HEADER_FIELDS = { "HD" : { "VN" : str, "SO" : str, "GO" : str },
                        "SQ" : { "SN" : str, "LN" : int, "AS" : str, "M5" : str, "UR" : str, "SP" : str },
                        "RG" : { "ID" : str, "SM" : str, "LB" : str, "DS" : str, "PU" : str, "PI" : str, 
                                 "CN" : str, "DT" : str, "PL" : str, "FO" : str, "KS" : str },
                        "PG" : { "PN" : str, "ID" : str, "VN" : str, "CL" : str }, }

# output order of fields within records
VALID_HEADER_ORDER = { "HD" : ( "VN", "SO", "GO" ),
                       "SQ" : ( "SN", "LN", "AS", "M5" , "UR" , "SP" ),
                       "RG" : ( "ID", "SM", "LB", "DS" , "PU" , "PI" , "CN" , "DT", "PL", "FO", "KS" ),
                       "PG" : ( "PN", "ID", "VN", "CL" ), }

Original issue reported on code.google.com by john.jos...@gmail.com on 6 Jan 2013 at 7:42

GoogleCodeExporter commented 9 years ago
Fixed, thanks for telling us.

Original comment by andreas....@gmail.com on 14 Jan 2013 at 10:26

GoogleCodeExporter commented 9 years ago
Hello, I see the same problem with pysam v.0.7.5
I which version did you make the fix and was included in 0.7.5?

Thanks

Original comment by natpo...@gmail.com on 7 Feb 2014 at 9:41

GoogleCodeExporter commented 9 years ago
Hi,

the fix should be part of 0.7.3 onwards. It is part of the current version 
0.7.7. 

Can you provide a link to the bam-file that causes the problem?
I can then test it.

Best wishes,
Andreas

Original comment by andreas....@gmail.com on 19 Feb 2014 at 9:49