chapmanb / bcbb

Incubator for useful bioinformatics code, primarily in Python and R
http://bcbio.wordpress.com
609 stars 243 forks source link

Missing features in GFF3 data #108

Open thobalose opened 8 years ago

thobalose commented 8 years ago

Hello,

I have a gff3 file with the following limits:

gff3

I am using the following to limit to the features of interest:

limit_info=dict(
            gff_source_type=[
                ('ena', 'transcript'), ('ena', 'CDS'), ('ena', 'gene'), ('ena', 'exon'),
                ('ena', 'tRNA_gene'), ('ena', 'ncRNA_gene'), ('ena', 'rRNA_gene'), 
                ('ena', 'pseudogene')
            ],
            gff_source=['ena']
)

However, I am having trouble retrieving data about the transcripts and CDS when parsing and looking for feature.type == 'transcript' or feature.type == 'CDS' in rec.features. It is as if these are not being captured. What's interesting is that, when the limit_info dict is reduced to ('ena', 'transcript') or ('ena', 'CDS'), I do retrieve this data.

What might be the issue here?