geneontology / obographs

Basic and Advanced OBO Graphs: specification and reference implementation
63 stars 12 forks source link

References associated with taxon constraints are missing from go-plus.json #22

Open tonysawfordebi opened 7 years ago

tonysawfordebi commented 7 years ago

References that are associated with taxon constraints, and which are visible in go-plus.obo, are not finding their way into go-plus.json

A couple of examples from go-plus.obo:

[Term] id: GO:0045271 name: respiratory chain complex I ... relationship: never_in_taxon NCBITaxon:4896 {id="GOTAX:0000525", source="PMID:21597881"} ! Schizosaccharomyces pombe relationship: never_in_taxon NCBITaxon:4932 {id="GOTAX:0000524", source="PMID:21597881"} ! Saccharomyces cerevisiae

[Term] id: GO:0019819 name: P1 peroxisome ... relationship: only_in_taxon NCBITaxon:4952 {id="GOTAX:0000526", source="PMID:10629216", source="PMID:14504266"} ! Yarrowia lipolytica

cmungall commented 7 years ago

On 8 Mar 2017, at 7:19, Tony Sawford wrote:

References that are associated with taxon constraints, and which are visible in go-plus.obo, are not finding their way into go-plus.json

A couple of examples from go-plus.obo:

[Term] id: GO:0045271 name: respiratory chain complex I ... relationship: never_in_taxon NCBITaxon:4896 {id="GOTAX:0000525", source="PMID:21597881"} ! Schizosaccharomyces pombe relationship: never_in_taxon NCBITaxon:4932 {id="GOTAX:0000524", source="PMID:21597881"} ! Saccharomyces cerevisiae

   "id" : "http://purl.obolibrary.org/obo/GO_0045271",
  ...
     "basicPropertyValues" : [ {
       "pred" : "http://purl.obolibrary.org/obo/RO_0002161",
       "val" : "http://purl.obolibrary.org/obo/NCBITaxon_4932"
     }, {

[Term] id: GO:0019819 name: P1 peroxisome ... relationship: only_in_taxon NCBITaxon:4952 {id="GOTAX:0000526", source="PMID:10629216", source="PMID:14504266"} ! Yarrowia lipolytica

   "sub" : "http://purl.obolibrary.org/obo/GO_0019819",
   "pred" : "http://purl.obolibrary.org/obo/RO_0002160",
   "obj" : "http://purl.obolibrary.org/obo/NCBITaxon_4952"

Now you may rightly wonder: why is never_in_taxon inlined under a node in 'meta', and why is only_in_taxon treated as a logic edge inlined under "edges"?

It's because this is a direct translation of the OWL, where only_in_taxon is modeled using a SubClassOf-SomeValuesFrom, and never_in_taxon is directly represented in the OWL as an annotation.

OK, but answer my question Chris, why is it that way?

Well, OWL is ultimately a language for describing relations between sets and set-theoretic constructs. There are various ways to encode taxon constraints set theoretically, but these involve nesting and constructs such as ComplementOf which is outside obo format and related graph-oriented representations. So we opted to use a 'shortcut' relation to encode the constraint in the source, and to expand to OWL when doing reasoning.

This might be a good time to revisit this. As we no longer use OE to edit the source for GO, and we have the option of having a special-purpose construct in obographs for convenient access to taxon constraints (or more generally, any X SubClassOf R only not Y axiom). I think this construct should like separately from the "edges" block.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geneontology/obographs/issues/22

tonysawfordebi commented 7 years ago

Actually, I did understand why o_i_t and n_i_t were represented differently, but that wasn't my question. Indeed, I didn't actually ask a question, but made an observation ;)

The lack of the references associated with the taxon constraints is now the only difference between the output that my new obograph parser generates and that from my current owltools-based lash-up, so if there was a way to get those into the JSON, then everything (as far as my use case is concerned) would be perfect, and I could switch over to consuming go-plus.json, which would be A Good Thing.

cmungall commented 7 years ago

oh, I should really read comments before replying! Yes you quite clearly said references. But hopefully my pedagogic excursion will be of use to anyone else puzzled by the TC representation.

so the simplest thing would be to include annotations on annotation assertion axioms (which should be done in any case), giving you a direct analog to the OWL.

I still think we should eventually move away from encoding the negated ones as annotations, but I will give plenty warning before switching.