daler / gffutils

GFF and GTF file manipulation and interconversion
http://daler.github.io/gffutils
MIT License
282 stars 76 forks source link

db.parents(id) stops at level 2? #204

Open dariober opened 1 year ago

dariober commented 1 year ago

I'm applying db.parents(id) to features that have up to 3 parents, e.g. the nesting is id -> protein_match -> mRNA -> gene. It appears that the 3rd level, gene, is not returned. Am I missing something? Here's an example:

import gffutils

txt="""\
chr1 AUGUSTUS gene 68330 73621 1 - . ID=g1903;
chr1 AUGUSTUS mRNA 68330 73621 1 - . ID=g1903.t1;Parent=g1903;
chr1 Pfam protein_match 73372 73618 1 - . ID=g1903.t1.d1;Parent=g1903.t1;
chr1 Pfam protein_hmm_match 73372 73618 1 - . ID=g1903.t1.d1.1;Parent=g1903.t1.d1;
"""

db = gffutils.create_db(txt.replace(' ', '\t'), ':memory:', from_string=True)

Show the features:

for x in db.all_features():
    print(x)

chr1 AUGUSTUS gene              68330 73621 1 - . ID=g1903;
chr1 AUGUSTUS mRNA              68330 73621 1 - . ID=g1903.t1;Parent=g1903;
chr1 Pfam     protein_match     73372 73618 1 - . ID=g1903.t1.d1;Parent=g1903.t1;
chr1 Pfam     protein_hmm_match 73372 73618 1 - . ID=g1903.t1.d1.1;Parent=g1903.t1.d1;

Now, db.parents('g1903.t1.d1.1') returns as parents protein_match and the mRNA, but not the gene:

pp = db.parents('g1903.t1.d1.1')

for p in pp:
    print(p)

chr1    AUGUSTUS    mRNA    68330   73621   1   -   .   ID=g1903.t1;Parent=g1903;
chr1    Pfam    protein_match   73372   73618   1   -   .   ID=g1903.t1.d1;Parent=g1903.t1;

Shouldn't gene also be retrieved as a parent? Thanks!

This is with gffutils 0.11.1