geneontology / minerva

BSD 3-Clause "New" or "Revised" License
6 stars 8 forks source link

remove neo.owl from go-lego.owl used by minerva #260

Closed goodb closed 4 years ago

goodb commented 4 years ago

Per October 2019 discussions:

geneontology/neo#47 https://docs.google.com/document/d/1rOXCoJ-ZKGCGQ_0LpJOlsVVfVyKgUez52Kdq_VnUxEk/edit?ts=5d7ff0d8# https://docs.google.com/document/d/1h_vnzkP94YC5l3ZmxKyZVOuTzIHIsHRsOLtBSGyB25k/edit?pli=1 In summary, we want to remove the dependency of loading classes from all gene products from all species into neo for access in minerva. This doesn't scale up very well. To accomplish this, the current plan is to move the class information (including label and main upper level type, e.g. protein, for gene product instances) out of neo and into GOLR alone. Minerva will then access this information dynamically as it builds and reasons over models.

goodb commented 4 years ago

noting that the GOLR instance to be used for development here is http://noctua-golr.berkeleybop.org/

Example gene product ids and parent list for a human protein and yeast gene:

UniProtKB:P32241-1 http://purl.obolibrary.org/obo/CHEBI_51143 nitrogen molecular entity http://purl.obolibrary.org/obo/CHEBI_33839 macromolecule http://purl.obolibrary.org/obo/CHEBI_33256 primary amide http://purl.obolibrary.org/obo/CHEBI_33675 p-block molecular entity http://purl.obolibrary.org/obo/CHEBI_36963 organooxygen compound http://purl.obolibrary.org/obo/CHEBI_33579 main group molecular entity http://purl.obolibrary.org/obo/CHEBI_32988 amide http://purl.obolibrary.org/obo/CHEBI_25806 oxygen molecular entity http://purl.obolibrary.org/obo/CHEBI_33285 heteroorganic entity http://purl.obolibrary.org/obo/CHEBI_33582 carbon group molecular entity http://purl.obolibrary.org/obo/CHEBI_36357 polyatomic entity http://purl.obolibrary.org/obo/CHEBI_37622 carboxamide http://purl.obolibrary.org/obo/CHEBI_23367 molecular entity http://purl.obolibrary.org/obo/BFO_0000030 object http://purl.obolibrary.org/obo/CHEBI_50047 organic amino compound http://purl.obolibrary.org/obo/CHEBI_24431 chemical entity http://purl.obolibrary.org/obo/CHEBI_50860 organic molecular entity http://purl.obolibrary.org/obo/CHEBI_33302 pnictogen molecular entity http://purl.obolibrary.org/obo/CHEBI_33304 chalcogen molecular entity http://purl.obolibrary.org/obo/CHEBI_35352 organonitrogen compound http://purl.obolibrary.org/obo/CHEBI_36962 organochalcogen compound http://purl.obolibrary.org/obo/CHEBI_33694 biomacromolecule http://purl.obolibrary.org/obo/CHEBI_33695 information biomacromolecule http://purl.obolibrary.org/obo/CHEBI_36080 protein http://purl.obolibrary.org/obo/CHEBI_16670 peptide http://purl.obolibrary.org/obo/PR_000000001 protein

SGD:S000005952 http://purl.obolibrary.org/obo/BFO_0000030 object http://purl.obolibrary.org/obo/CHEBI_33839 macromolecule http://purl.obolibrary.org/obo/CHEBI_33582 carbon group molecular entity http://purl.obolibrary.org/obo/CHEBI_36357 polyatomic entity http://purl.obolibrary.org/obo/CHEBI_24431 chemical entity http://purl.obolibrary.org/obo/CHEBI_33694 biomacromolecule http://purl.obolibrary.org/obo/CHEBI_50860 organic molecular entity http://purl.obolibrary.org/obo/CHEBI_33695 information biomacromolecule http://purl.obolibrary.org/obo/CHEBI_33675 p-block molecular entity http://purl.obolibrary.org/obo/CHEBI_23367 molecular entity http://purl.obolibrary.org/obo/CHEBI_33579 main group molecular entity

@kltm @cmungall @balhoff my thinking here is to give just the most specific parent as the type of any instances. e.g. instances of UniProtKB:P32241-1 get rdf:type : CHEBI_36080 (protein) and instances of SGD:S000005952 get rdf:type CHEBI_33695 (information biomacromolecule)

Can you think of any other types that we will need to look for and cover here?

I see some complexes in the load, e.g. https://www.ebi.ac.uk/complexportal/complex/CPX-900 , but these are typed just like genes. They would default to being called information biomacromolecules CHEBI_33695

goodb commented 4 years ago

@balhoff and @kltm could you take a look at this when you have time? See recent commit. I think it might be done.. I tested it with a local noctua that had go_lego loaded without the neo import and it worked as I wanted. Reasoner worked, the gene product parent types are added and saved for genes/proteins.

Certainly some optimizations could be done, though it seems fast enough now. e.g. there are two requests to golr per incoming instance when there could be one. What else should I check on?

Here is an example OWL file generated with this on.
example_go_cam_lego_lite.txt

And screenshot.
Screen Shot 2019-11-14 at 9 24 23 AM

kltm commented 4 years ago

Discussed with @goodb after software call, with adjustments made to initial working list. There is probably more for conversation there, especially the final item concerning SynGO and how we want to treat exotic vs endemic models (e.g. categories as required add-in or not).

goodb commented 4 years ago

Changing first requirement here to a dynamic approach that never caches the rdf:type upper-type in the models that are displayed and saved, but only adds them as needed prior to reasoning.

Was - - [ ] When a gene product instance is created in Noctua, add the high level rdf:type (protein or information biomacromolecule) from GOLR . is now - [ ] When minerva loads a GO-CAM RDF model, dynamically retrieve and add in upper-level type information for all the gene products in the model such that these are accessible to the Arachne reasoner and the shex validator. When reasoning and validation complete, remove these from the model.

Important that the main client interface does not see any changes as a result of this.

goodb commented 4 years ago

@kltm when you are able, I would like to test https://github.com/geneontology/minerva/pull/265 on dev. I believe it resolves our thanksgiving issue as well as reducing the number of other tasks on this issue list. If my local testing is reflective of dev and master server states, I think we are ready to tick down the rest of the checkboxes here.

kltm commented 4 years ago

Slowed down with the Alliance meeting. Now on dev.

vanaukenk commented 4 years ago

I tested the UI on noctua-dev this morning with entries for several different species and entity types. All looks fine in the display on the form and graph editors.

image

image

Note: will need to discuss validation errors based on entity types values, however. Various aspects of that discussion are already in multiple tickets on geneontology/go-shapes.

kltm commented 4 years ago

Noting that the version @vanaukenk tested on dev was: https://github.com/geneontology/minerva/commit/95bc102f8ee885715f794116148e24f5cef40546 The current production version is: https://github.com/geneontology/minerva/commit/5ae8bf16e752327eda1d8671997cad4f34cb838b

goodb commented 4 years ago

@kltm although the mechanism evolved a bit, I think the task list on top remains accurate.

kltm commented 4 years ago

@goodb Okay, but my list there seems a little garbled to me now at this point, especially as I intended it to be ordered and I think we've accomplished some of these out of order already. It would be nice to go over this with you tomorrow either on the call or later on.

goodb commented 4 years ago

@kltm trying to summarize "the plan" below. I think it takes three key files to make a system that will work for all the models in dev (and thus also master), including reactome, and will not pollute the global type-ahead system with reactome entities.

Pipeline produces:

Services: GOLR - purpose: type ahead search over ontologies (and genes)

goodb commented 4 years ago

@kltm if there were a way to filter GOLR requests to eliminate the reactome entities that would simplify things a bit (just add reacto to the import list that builds go-lego and no more need for -with-reacto everywhere). We would still need one OWL ontology that does not contain neo for minerva.

kltm commented 4 years ago

@goodb Does this sound right to you?

Current ontologies being made:

Current ontology journals (for minerva):

Ontologies we need:

Ontology journals we need (for minerva):

goodb commented 4 years ago

Assuming there isn't a way to deal with excluding reacto entities at the golr level, yes, this is it.

kltm commented 4 years ago

@goodb Okay, what I'm trying to balance here is the simplest and least obscure product set with adding a set of required synchronized changes of client code (filtering reactome).

Currently, we filter with 'regulates_closure', 'CHEBI:23367'. In addition, we could add a namespace filter. While we could setup the experiment, with what you know, would these be enough to keep reacto items out?

Assuming we went the GOlr route, that would only save us the replacement of the journal above, correct? We'd still need some form of the other data products, right? It seems like as far as pipeline complexity goes, we save little. However, it does seem to make the products a little less stilted...

goodb commented 4 years ago

I'm not really clear on what the regulates closure entails. CHEBI:23367 would include all of reacto as it stands. We could do a lot to tun it up to match a filter if we want as it isn't used for anything aside from these models.

Given an uncertain future for reacto, I'd lean towards building the react-specific products rather the tuning up golr to avoid it for now.

kltm commented 4 years ago

From the conversation on today's software call, let's go ahead and take the "data" approach, producing the specific data products needed (rather than trying to fudge the clients). Producing the products should probably occur on geneontology/pipeline, with the issue-35-neo-test branch. https://github.com/geneontology/pipeline/issues/35

goodb commented 4 years ago

Related - pr to make reacto in the go makefile https://github.com/geneontology/go-ontology/pull/19288.

goodb commented 4 years ago

Closing. Only remaining thing to do is to get it running on master. believe that is its own issue.