geneontology / gocamgen

Base repo for constructing GO-CAM model RDF
0 stars 0 forks source link

Simple mapping of extension relations to ontologies (RO/BFO) #51

Open dustine32 opened 5 years ago

dustine32 commented 5 years ago

In this spreadsheet we have all relations used in annotation extensions, at least in WB and MGI files. For many of these, I can easily map to an RO or BFO term:

regulates_transport_of -> RO:0002011

And the rest are available in the GOREL ontology (produced by the GO pipeline?), with some having xrefs to RO. Unfortunately I only see two of our unmapped relations having RO xrefs (has_agent and regulates_level_of). We'll need to figure out how to translate the other GORELs into GO-CAM if the GOREL relations aren't available for viewing in Noctua.

dustine32 commented 5 years ago

@ukemi @vanaukenk I pinged @balhoff about this on gitter:

Not all the gorel maps to RO. Really they shouldn’t overlap.

As you both know, it's late on the east coast so he said he'd explain a bit more later. But I'm thinking there'll maybe need to be some manual curation of RO terms to these remaining relations?

ukemi commented 5 years ago

We should invite @balhoff to our call next week, but I think some of the gorel relations will not map to RO, even by hand.

vanaukenk commented 5 years ago

I agree with @ukemi Some of the gorel relations may never make it up to RO, so we'll need to look at those individually to decide what's the best thing to do.

dustine32 commented 5 years ago

@vanaukenk @ukemi I have the latest set of WB/MGI models on noctua-dev now. This set was generated with the "flood gates open," meaning: we now allow all annotation extension relations that can be mapped to RO/BFO/GOREL terms to be expressed in the model.

An odd hiccup is that the owl:ObjectProperty declarations on GOREL relations (necessary for them to display in Noctua) seem to be randomly omitted from some models during gocamgen generation, resulting in loose term individuals floating around. I can't yet reproduce this bug when generating the model individually but I'll try with another full run. Maybe it can be isolated to generating too many at once? Some examples:

I'll also fill in the "Ex gene" col in the spreadsheet for those I can find.

ukemi commented 5 years ago

There are lots of anatomical structures eerily floating around in the beta catenin model. I like it that you went bold and chose that gene.

ukemi commented 5 years ago

@dustine32 the annotation extensions are still being attached to the function instead of the process. See the head paraxial mesoderm in the MGI:102806 model. It is attached to molecular_function instead of 'anterior posterior pattern specification'.

dustine32 commented 5 years ago

@ukemi OK, I'm not yet sure how to code for this case. Here's the GPAD source line:

MGI MGI:102806 acts_upstream_of_or_within GO:0009952 MGI:MGI:3689276|PMID:16991118 ECO:0000316 MGI:MGI:87912 20061212 MGI has_output_o_axis_of(EMAPA:16171)|has_output_o_axis_of(EMAPA:16183)

In the code, for acts_upstream_of qualifier relations on the primary annotation, I have the root MF chosen (termed the "anchor") to be the node in the assertion graph off of which all extensions will hang. I could instead change this "anchor" assignee to be the primary term ('anterior posterior pattern specification' in this case) and the -[has_output_o_axis_of]->(head paraxial mesenchyme) extension edge will hang off that.

But will this anchor assignee ever vary by extension relation for acts_upstream_of... primary annotations? Is this "triple subject must be BP" constraint coded in the relations (or GOREL) ontology for these extension relations? That would make it a bit easier to code. Will we need to get this granular/relation-specific for other annotation qualifiers like enables or involved_in?

ukemi commented 5 years ago

Hi @dustine32, For any line in the GPAD, the GO identifier in column 4 is the identifier that is connected to the annotation extensions. So in the case of any Identifier that is a Biological process (maybe look at the namespace), the annotation extension relations and terms should be attached to the process even if we create a generic molecular function node between the process and the gene product.

The only time I can imaging this might vary in the annotation extensions are the weird has_annotation_target relations and the chain relations, which would chain to the corresponding terms.

ukemi commented 5 years ago

PS. There are also the constraints that certain annotation extension relations should be limited in domain to certain ontology terms and children (column 4). I made note of this on the spreadsheet (extensions list with provided by- MGI tab), but I thought we had decided that we wouldn't enforce that in the initial import. Eventually we will want to correct those mistakes so if it is trivial that would be awesome..

dustine32 commented 5 years ago

@ukemi Awesome, thanks! I can change the logic to always hang extensions of the annotated primary term (in col 4) by default.