Closed hexylena closed 9 years ago
Eric;
Sorry about any confusion. When parsing to Biopython objects, the first column is in the SeqRecord as the id
attribute, not in the records SeqFeatures. The idea is that the first column describes the parent sequence for everything gets ordered under that as features. So if you're iterating and need access to this, you can do:
for rec in GFF.parse(your_file):
for feature in rec.features:
print rec.id, feature.id
Hope this gives you what you need. for this.
Also, if you're exploring GFF parsing in Python more, it would be worth looking at Ryan's gffutils (https://github.com/daler/gffutils) which builds off of this library but has a lot of additional work.
Oh, heavens, you're right, that is the record id. I'd completely forgotten that the record had an id
attribute. Thanks so much!
I'm trying to handle some data from InterProScan, and I'm having a wee bit of trouble due to what seems to be missing support for gff3 column 1 data. Here's a snippet of the gff3 I'm receiving:
The
id
attribute of the produced SeqFeature contains ID, which is useful for thematch_part
but not for theprotein_match
sections. I really need access to the first column to reliably detect the parent sequence, since the output, as it stands, doesn't use the Parent tag to ensure hierarchy of matches.Would there be any interest in a PR to add support for this?