chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
610 stars 243 forks source link

parsing of GFF3 attributes fails when tag starts with a space #88

Closed wholtz closed 10 years ago

wholtz commented 10 years ago

The GFF3 spec appears to allow attribute tags to start with a space. BCBio does not handle this well. Below is a test case where input1.gff3 contains no spaces in tag and input2.gff3 and input3.gff3 contain the tag "foo". All three inputs should have the same output, but they don't as shown here: $ cat examineGFF3.py

!/usr/bin/env python

import pprint from BCBio.GFF import GFFExaminer import sys

examiner = GFFExaminer() pprint.pprint(examiner.parent_child_map(sys.stdin)) $ cat input1.gff3

gff-version 3

contig1 . gene 1544 2057 . - . ID=contig1.1 contig1 . mRNA 1544 2057 . - . ID=mRNA.contig1.1;Parent=contig1.1 $ cat input2.gff3

gff-version 3

contig1 . gene 1544 2057 . - . ID=contig1.1 contig1 . mRNA 1544 2057 . - . foo=bar;ID=mRNA.contig1.1;Parent=contig1.1 $ cat input3.gff3

gff-version 3

contig1 . gene 1544 2057 . - . ID=contig1.1 contig1 . mRNA 1544 2057 . - . ID=mRNA.contig1.1;Parent=contig1.1; foo=bar $ ./examineGFF3.py < input1.gff3 {('', 'gene'): [('', 'mRNA')]} $ ./examineGFF3.py < input2.gff3 {} $ ./examineGFF3.py < input3.gff3 Traceback (most recent call last): File "./examineGFF3.py", line 8, in pprint.pprint(examiner.parent_child_map(sys.stdin)) File "/Some/Path/env/lib/python2.7/site-packages/BCBio/GFF/GFFParser.py", line 744, in _file_or_handle_inside out = fn(_args, *_kwargs) File "/Some/Path/env/lib/python2.7/site-packages/BCBio/GFF/GFFParser.py", line 829, in parent_child_map self._get_local_params())[0] File "/Some/Path/env/lib/python2.7/site-packages/BCBio/GFF/GFFParser.py", line 169, in _gff_line_map quals, is_gff2 = _split_keyvals(gff_parts[8]) File "/Some/Path/env/lib/python2.7/site-packages/BCBio/GFF/GFFParser.py", line 93, in _split_keyvals assert len(item) == 1, item AssertionError: ['ID', 'mRNA.contig1.1;Parent', 'contig1.1'] $

chapmanb commented 10 years ago

Thank you for this report and sorry to have missed it initially. I added a test case for these inputs and realized that the current fixes in the development tree avoid this issue but had not yet been released to pypi. I pushed a new version 0.5 that appears to work correctly on your inputs. Thank you again for the bug report.