chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
603 stars 243 forks source link

IndexError with NCBI gff #128

Open MrTomRod opened 4 years ago

MrTomRod commented 4 years ago

Hi!

I annotated a bacterium (Acidipropionibacterium acidipropionici - strain FAM19036) with NCBI PGAP.

I wanted to create SeqIO-objects from the gff file, but it failed:

import pprint
from BCBio import GFF
from BCBio.GFF import GFFExaminer
examiner = GFFExaminer()
with open('data/FAM19036/annot.gff') as in_handle:
    pprint.pprint(examiner.available_limits(in_handle))
print("------------------------------------------------------------")
with open('FAM19036/annot.gff') as in_handle:
    for rec in GFF.parse(in_handle):
        print(rec)
{'gff_id': {('CP040634.1',): 6772},
 'gff_source': {('.',): 3361,
                ('GeneMarkS-2+',): 360,
                ('Local',): 1,
                ('Protein Homology',): 2916,
                ('cmsearch',): 24,
                ('tRNAscan-SE',): 110},
 'gff_source_type': {('.', 'exon'): 8,
                     ('.', 'gene'): 3208,
                     ('.', 'pseudogene'): 137,
                     ('.', 'rRNA'): 8,
                     ('GeneMarkS-2+', 'CDS'): 360,
                     ('Local', 'region'): 1,
                     ('Protein Homology', 'CDS'): 2916,
                     ('cmsearch', 'RNase_P_RNA'): 1,
                     ('cmsearch', 'SRP_RNA'): 1,
                     ('cmsearch', 'exon'): 7,
                     ('cmsearch', 'rRNA'): 4,
                     ('cmsearch', 'riboswitch'): 10,
                     ('cmsearch', 'tmRNA'): 1,
                     ('tRNAscan-SE', 'exon'): 55,
                     ('tRNAscan-SE', 'tRNA'): 55},
 'gff_type': {('CDS',): 3276,
              ('RNase_P_RNA',): 1,
              ('SRP_RNA',): 1,
              ('exon',): 70,
              ('gene',): 3208,
              ('pseudogene',): 137,
              ('rRNA',): 12,
              ('region',): 1,
              ('riboswitch',): 10,
              ('tRNA',): 55,
              ('tmRNA',): 1}}
------------------------------------------------------------

Error
Traceback (most recent call last):
  File "/usr/lib64/python3.7/unittest/case.py", line 59, in testPartExecutor
    yield
  File "/usr/lib64/python3.7/unittest/case.py", line 628, in run
    testMethod()
  File "/project/gene_loci_comparison/test_gene_loci_comparison.py", line 129, in test_recreate_gff_bug
    for rec in GFF.parse(in_handle):
  File "/project/venvs/gene_loci_comparison/lib64/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 746, in parse
    target_lines):
  File "/project/venvs/gene_loci_comparison/lib64/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 327, in parse_in_parts
    cur_dict = self._results_to_features(cur_dict, results)
  File "/project/venvs/gene_loci_comparison/lib64/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 369, in _results_to_features
    base = self._add_directives(base, results.get('directive', []))
  File "/project/venvs/gene_loci_comparison/lib64/python3.7/site-packages/BCBio/GFF/GFFParser.py", line 388, in _add_directives
    val = (val[0], int(val[1]) - 1, int(val[2]))
IndexError: tuple index out of range

To recreate the bug, here is the relevant gff file.

Thanks in advance.

Edit: bcbio-gff version 0.6.6