geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

GAF/GPAD example model 5745387b00001770 #59

Closed ukemi closed 7 years ago

ukemi commented 7 years ago

GAF model 5745387b00001770
MGI MGI:3588192 Zcchc16 Mmus GO:0042415 PMID: 26402067 IMP MGI:MGI:5789261 P gene taxon:10090 20160822 GO_Noctua
MGI MGI:3588192 Zcchc16 Mmus GO:0050890 PMID: 26402067 IMP MGI:MGI:5789261 P gene taxon:10090 20160822 GO_Noctua

GPAD model 5745387b00001770
MGI MGI:3588192 involved_in GO:0042415 PMID: 26402067 ECO:0000315 MGI:MGI:5789261 20160822 GO_Noctua contributor=GOC:hjd MGI MGI:3588192 involved_in GO:0050890 PMID: 26402067 ECO:0000315 MGI:MGI:5789261 20160822 GO_Noctua contributor=GOC:hjd

ukemi commented 7 years ago

This is an example of a simple causal chain.

cmungall commented 7 years ago

I would disagree with the 'involved in' call here. Currently in RO we have 'involved in' being inferred if there is a chain of G enables <anon MF> part-of P.

we do have an RO relation 'acts upstream of' (http://purl.obolibrary.org/obo/RO_0002263 - in next RO release), which has the chain enables o 'causally upstream of', this is what would be inferred. E.g.

MGI:3588192 acts_upstream_of GO:0042415
MGI:3588192 acts_upstream_of GO:0050890

not clicking on the RO URI will not work as ontobee is stale: https://github.com/OntoZoo/ontobee/issues/88

OLS is incomplete unfortunately: http://www.ebi.ac.uk/ols/ontologies/ro/properties?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FRO_0002263

ukemi commented 7 years ago

Involved in is the current relation used for all process annotations. I disagree too, but there is misalignment with the GPAD specs and what I think we would like to see. If we can expand the GPAD qualifiers to really reflect the relationship between the annotated object and the term, that would make me happy.

From:::http://wiki.geneontology.org/index.php/Final_GPAD_and_GPI_file_format

  1. The explicit relations will be:

'part_of' for Cellular Component

'involved_in' for Biological Process

'enables' for Molecular Function

cmungall commented 7 years ago

Involved in is the current relation used for all process annotations

it's the default relation, but not the only relation.

the docs on the main website have never been satisfactory. The original markdown/html is primary: http://www.geneontology.org/specifications/gpad/gpad-1.html (should really move this to github)

Anyway, my preference would be to define involved-in narrowly, and to use a more specific relation in this case. But if we want to broaden the definition of involved in, we need to add the property chains to RO

cmungall commented 7 years ago

Moved GPAD/GPI 1.2 docs here: https://github.com/geneontology/go-annotation/blob/master/specs/gpad-gpi-1_2.md

ukemi commented 7 years ago

I agree that the definition of involved in should remain narrow. Using the larger set of relations gives us a huge step towards the expanded qualifiers that Kimberly and I would like to see. That works well for the new Noctua models. What about the legacy annotations for those of us who are submitting GPAD to the GOC? Should involved in really be the default?

balhoff commented 7 years ago

I am getting 3 annotations via RDFox property reasoning:

 "pr_type" , "rel_label" , "target_type" , "target_type_label" , "extensions" , "evidence_type" , "with" , "contributor" , "date" , "source" ,
 "http://www.informatics.jax.org/accession/MGI:MGI:3588192" , "enables" , "http://purl.obolibrary.org/obo/GO_0003674" , "molecular_function" , "causally upstream of or within(norepinephrine metabolic process),causally upstream of or within(cognition)" , "http://purl.obolibrary.org/obo/ECO_0000315" , "MGI:MGI:5789261" , "GOC:hjd" , "2016-08-22" , "PMID:26402067" ,
 "http://www.informatics.jax.org/accession/MGI:MGI:3588192" , "acts upstream of or within" , "http://purl.obolibrary.org/obo/GO_0042415" , "norepinephrine metabolic process" , "causally downstream of or within(molecular_function),causally upstream of or within(cognition)" , "http://purl.obolibrary.org/obo/ECO_0000315" , "MGI:MGI:5789261" , "GOC:hjd" , "2016-08-22" , "PMID:26402067" ,
 "http://www.informatics.jax.org/accession/MGI:MGI:3588192" , "acts upstream of or within" , "http://purl.obolibrary.org/obo/GO_0050890" , "cognition" , "causally downstream of or within(molecular_function),causally downstream of or within(norepinephrine metabolic process)" , "http://purl.obolibrary.org/obo/ECO_0000315" , "MGI:MGI:5789261" , "GOC:hjd" , "2016-08-22" , "PMID:26402067" ,

I am finding that these queries generate more annotation extensions than you may be looking for. But perhaps that's inevitable translating back from LEGO.

ukemi commented 7 years ago

The primary annotations look good, yay! You are correct in that there are more annotation extensions than we would normally make, but that might not be a bad thing. Let's look at more examples. The extensions all look ok, so maybe they are just more complete than the ones a curator would have made making single annotations. The 'molecular fnction' one isn't particularly informative. This might just be a natural fallout of the LEGO mindset.

balhoff commented 7 years ago

Output from test job: https://build.berkeleybop.org/job/export-lego-to-gpad-sparql/lastSuccessfulBuild/artifact/legacy/gpad/5745387b00001770.gpad/*view*/

Missing annotations are probably result of not including full RO. Will see about fixing that.

balhoff commented 7 years ago

@ukemi I think we can close this ticket, if you review the current output at the link in the previous comment. Annotation extensions are no longer being output for this one due to more restrictive conditions we've implemented ('causally upstream of or within' could be added to the extensions whitelist).

ukemi commented 7 years ago

@balhoff This is a simple model, but this is what we would want from it. The only very minor issue is whether we want to create annotations to the root node, in this case GO:0003674. In the past those were reserved for special cases to keep track of when we had looked for information about a gene product exhaustively but couldn't find any. We could exclude them at the GPAD/GAF-generation step, filter them out at a post-production step by the GOC, or have consumers filter them out as they see fit. Heiko argued that if they have contextual information, they are useful. I tend to agree with him, but it is another viewpoint shift. I am closing this for now since everything about the annotations from this model is essentially correct.