Planteome / planteome-annotation-data

This is a place to discuss issues around the Planteome annotation data and store useful scripts etc.
1 stars 0 forks source link

Obsolete GO terms in annotation files #2

Open cooperl09 opened 7 years ago

cooperl09 commented 7 years ago

During our recent load, we noticed that there are annotations to 36 obsoletes GO terms. Many of these do not have clear replacement terms.
These do not break the loading process now, but should be resolved to maintain the quality of the annotation data. List with a few examples: GO:0001619 GO:0003702 GO:0003711 GO:0003715 GO:0004086 GO:0004221 GO:0004428 GO:0004437 GO:0004986 GO:0005624 GO:0005792 GO:0006184 GO:0006200 GO:0007090 GO:0007108 GO:0008159 GO:0008471 GO:0008943 GO:0009296 GO:0010131 GO:0010843 GO:0016251 GO:0016563 GO:0016564 GO:0016566 GO:0016585 GO:0016986 GO:0017163 GO:0019861 GO:0030528 GO:0033903 GO:0035300 GO:0048196 GO:0050983 GO:0061597 GO:0070188

cooperl09 commented 7 years ago

This affects the go_ortholog, go_iprscan files and also these three:

elserj commented 7 years ago

Note that there I found several terms that are obsoleted, but the perl obo parser does not pick them up as such. Checking the go.obo file seems to show that these terms are listed only as "alt_id: " for ther term that replaced them. There may be more of these than what I found.

I found them because I was working on fixing missing/incorrect aspect columns in the gaf files and found them while tracking those down.

I think I may be able to figure out a way to find these terms and fix them, but it might be too late for this release.

cmungall commented 7 years ago

are you using go-perl?

elserj commented 7 years ago

I believe so. Here is the code that gets the obsoletes:

# init GO parser
my $parser = GO::Parser->new({handler=>'obj'});
$parser->parse($obo_file);
my $ont = $parser->handler->graph;
my $obo_terms = $ont->get_all_nodes;
foreach my $term (@$obo_terms) {
    if ($term->is_obsolete){
        my $id = $term->acc;
        my $name = $term->name;
        $obs_terms_hash{$id} = $name;
    }
}

An example ID that I found is GO:0000119. The only place this occurs in the obo file is as an alt_id to GO:0016592. It doesn't have it's own [Term] stanza with "is_obsolete: true", which would be my guess why the parser doesn't pick it up.

Unless I'm doing something incorrect...

cmungall commented 7 years ago

haven't touched go-perl for a long time, sorry

we have some hacky perl that does this update but we should really switch to having this use the api