geneontology / go-site

A collection of metadata, tools, and files associated with the Gene Ontology public web presence.
http://geneontology.org
BSD 3-Clause "New" or "Revised" License
46 stars 89 forks source link

Issues with GPADs generated from Noctua models #613

Closed tonysawfordebi closed 8 months ago

tonysawfordebi commented 6 years ago

I've just grabbed the latest batch of Noctua-generated GPADs from https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/lastSuccessfulBuild/artifact/legacy/ and run them through our syntax checker.

The resulting log file is attached to this ticket.

syntax_check.log

tonysawfordebi commented 6 years ago

I haven't done the comparison of the annotation sets that we load from the GAF and GPAD files, but I believe I've unearthed the reason why we're rejecting more annotations from the GPAD than from the GAF...

I spotted a small error in my script that analyses the errors found by the check & creates the "Analysis" section at the end of the log file, namely that it wasn't handling the "unsupported qualifier" error reports correctly. When I fixed that, it immediately became apparent that there is a problem with the way the qualifier column is being set in the GPAD.

I've attached an updated log file to this comment, but this is the relevant section:

Number of annotations with an unknown or unsupported qualifier: 2560 acts_upstream_of_or_within|NOT: 191 enables|NOT: 120 enables|colocalizes_with: 1 enables|contributes_to: 859 part_of|NOT: 102 part_of|colocalizes_with: 1287

syntax_check-GPAD.log

ukemi commented 6 years ago

Thanks Tony! So it seems that we are identifying the piped qualifier-values. I'm not sure why we would ever have enables|colocalizes_with, but the others seem ok. I think the NOT annotations should be allowed. I am less certain about the others. In the case of enables|contributes_to, perhaps contributes_to should win. In the case of part_of|colocalizes_with maybe it should be allowed. I think colocalizes_with has been used very inconsistently. @vanaukenk

tonysawfordebi commented 6 years ago

At a guess, what I think is happening is that when you're writing out the GPAD you're always outputting the default relation for the GO aspect, regardless of whether there was already a qualifier (contributes_to, colocalizes_with, or NOT), and this is leading to the double qualifiers.

If a qualifier is NOTted, "NOT|" should precede it.

In pseudocode, what we do when ouptutting GPAD is:

IF qualifier IS unset THEN gpad_qualifier := default_relation_for_aspect ELSIF qualifier = 'NOT' THEN gpad_qualifier := 'NOT|' + default_relation_for_aspect ELSE gpad_qualifier := qualifier ENDIF

ukemi commented 6 years ago

Yup. I can have that fixed pretty easily, my mistake interpreting the file specs. What about the contributes_to and colocalizes_with? Are they just not allowed at all?

tonysawfordebi commented 6 years ago

No, they're fine. They should just get passed through untouched (that's the "ELSE" bit of the pseudocode above).

tonysawfordebi commented 6 years ago

(As should "NOT|contributes_to" and "NOT|colocalizes_with")

pgaudet commented 6 years ago

We should update the specs. I can edit the documentation on the GO website as described by @tonysawfordebi above.

ok @suzialeksander ?

suzialeksander commented 6 years ago

@pgaudet I've saved most of the pages I think we're keeping on GH. Please edit the GH page, or if this documentation isn't in that repo, go ahead and edit as usual then let me know what the URL was- I can move/save it after you're done.

Thanks!

suzialeksander commented 8 months ago

@balhoff or @kltm Since the new GPAD spec is reasonably done and is almost-live, this ticket might be closable immediately (?)

kltm commented 8 months ago

This is quite stale and I think covered elsewhere now.