geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

Release go-plus and similar ontologies as standalone (no owl:imports) #16876

Closed balhoff closed 5 years ago

balhoff commented 5 years ago

I think we should change the release process of go-plus.owl to merge import modules, so that the downloaded file is completely standalone. Currently, while a given ontology release has a version IRI, and we are putting effort into making previous version IRIs resolve to these past versions, when you load one of these older files (e.g. in Protege), you end up loading all the current versions of the import modules. Clearly someone would prefer to get the exact ontology content from that previous version.

I think users are best served by having a prepackaged complete file, so that they don't need to use software that resolves an import chain. If they want to use just GO content and not axioms coming from imports, we are already publishing the standalone go-base.owl for this purpose.

Some further rationale for standalone OWL file and "base" files is in this Google doc: https://docs.google.com/document/d/1eCo5C3aZ9kjhBu98-24c2FHZIHEeFV7I2TJ4Vd6qw6k/edit#heading=h.u7or56kflnm2

In discussion of this with @kltm, @goodb, and @dougli1sqrd, it sounds like making all released files "standalone" will solve some difficult issues in the GO pipeline as well.

Are there any users that would notice such a change and be inconvenienced in any way?

balhoff commented 5 years ago

@alexsign I got a suggestion to check with you about this change. Do you make any use of http://purl.obolibrary.org/obo/go/extensions/go-plus.owl? If so would it make any difference if this and its imported files were merged into a single file?

cmungall commented 5 years ago

I believe he uses the JSON. Note that in the JSON, all ontologies are combined into one file, but there are different graph objects. There may be assumptions about GO belonging to it's own graph.

It should be possible to have different policy for the JSON, although it's cleaner if it's the same rules for everything

On Sun, Feb 3, 2019 at 12:16 PM Jim Balhoff notifications@github.com wrote:

@alexsign https://github.com/alexsign I got a suggestion to check with you about this change. Do you make any use of http://purl.obolibrary.org/obo/go/extensions/go-plus.owl? If so would it make any difference if this and its imported files were merged into a single file?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/16876#issuecomment-460084368, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGOW5_Sqo_9UMObqBawJsG8n-u8qV5ks5vJ0OAgaJpZM4aZIdg .

alexsign commented 5 years ago

@balhoff @cmungall I believe we use both. I'll investigate further and let you know.

balhoff commented 5 years ago

Thanks!

alexsign commented 5 years ago

@balhoff sorry for delayed reply, EBI datacenter had a major incident last weekend, so we just getting our internal services back. From what I can see right now @cmungall is right. From late 2017 we are using http://purl.obolibrary.org/obo/go/snapshot/extensions/go-plus.json instead of OWL file.

Here, the list of the other files we currently using:

http://purl.obolibrary.org/obo/go/snapshot/extensions/go-plus.json http://purl.obolibrary.org/obo/go/snapshot/extensions/gorel.obo http://purl.obolibrary.org/obo/go/snapshot/extensions/go-upper.obo http://purl.obolibrary.org/obo/go/snapshot/imports/go-taxon-groupings.obo

https://s3.amazonaws.com/go-public/metadata/db-xrefs.json https://s3.amazonaws.com/go-public/metadata/eco-usage-constraints.json

Please let me know if they are subjects of major changes.

balhoff commented 5 years ago

@alexsign thanks, no problem! As @cmungall pointed out, currently in the JSON file there are multiple graphs, each representing an ontology, such as go-plus and all the ontologies it imports. The change to that file would be that all the axioms would be in one graph, for example since go-plus imports some terms from CHEBI, there would be some relationships between CHEBI terms merged into the GO graph. These are currently in a separate graph in the same JSON file.

We could really keep the JSON file the way it is, since it already nicely packages everything into one file. But I'm curious if you think it would make any difference to you if everything was in one graph.

By the way, all the information from http://purl.obolibrary.org/obo/go/snapshot/imports/go-taxon-groupings.obo should be included inside the go-plus.json as the graph with id http://purl.obolibrary.org/obo/go/imports/go-taxon-groupings.owl.

alexsign commented 5 years ago

@balhoff I looked trough procedure that extracts data from JSON file. It might need some changes, but I should be able to adjust it. If decision would be made to combine all ontologies on one graph in JSON file, can we get it for testing before it replaces one we are using now.

balhoff commented 5 years ago

Thanks @alexsign. We will let you know ahead of time if we decide to do that.

balhoff commented 5 years ago

I'm thinking this is more urgent: now that the ontology PURL points to the official release, and there is a separate PURL for snapshots, if someone loads a go-plus snapshot into Protege, they end up loading the imports from release rather than snapshot. We could fix this by a pretty complicated system of setting different ontology IRIs for import modules depending on if it is a snapshot or release; but MUCH easier would be to just merge the imports as described in this ticket.

goodb commented 5 years ago

This caught me yesterday. Even apart from loading snapshot/official from the PURLs, I opened an ontology in a directory with an edited catalogue.xml file and mistakenly was looking at the ontology with a local, out of date version of an import. I think merged imports for releases is a really good idea. (It would be nice to also have a way for advanced users to get the non-merged files for development purposes, but no reason that needs to happen as the default way of getting the ontology.)

pgaudet commented 5 years ago

Sounds like a good idea. Is this something we should announce to go-friends or in the 'announcements' repo? I am not sure of the best way to reach interested people.

balhoff commented 5 years ago

@pgaudet I think we should announce it. Should that be before it happens in snapshot, or instead wait until a snapshot is available so that folks can immediately take a look at the snapshot?

pgaudet commented 5 years ago

Don't we want to give a bit of time for people to adjust their parsers and loading scripts ? I propose announcing it in advance and give a date (at least approximate) when it will happen.

balhoff commented 5 years ago

@pgaudet @cmungall how does this sound?

We plan to make a change to versions of the ontology, such as "go-plus.owl", that import external files. In an upcoming release, the external imports will be merged into the ontology, rather than referenced via an 'owl:import'. These external imports include content extracted from other ontologies, such as Uberon and ChEBI, which is needed for full classification of the GO. By merging the external content and GO content into a single file, we can ensure that the version of the external content used with a given release is exactly the version tested with that release.

This change will not affect the primary GO ontology files, such as go.obo, go.owl, and go-basic.obo, which are already standalone files.

For flexible OWL integration of GO axioms with different versions of external ontologies, we also provide 'go-base.owl', which references external terms but does not import any content from external ontologies.

This change to standalone, merged releases for 'go-plus' (and undocumented internal GO files 'go-gaf' and 'go-lego') will first take place as a snapshot release, no earlier than April 22, 2019.

cmungall commented 5 years ago

Looks good

On Wed, Apr 10, 2019 at 8:28 PM Jim Balhoff notifications@github.com wrote:

@pgaudet https://github.com/pgaudet @cmungall https://github.com/cmungall how does this sound?

We plan to make a change to versions of the ontology, such as "go-plus.owl", that import external files. In an upcoming release, the external imports will be merged into the ontology, rather than referenced via an 'owl:import'. These external imports include content extracted from other ontologies, such as Uberon and ChEBI, which is needed for full classification of the GO. By merging the external content and GO content into a single file, we can ensure that the version of the external content used with a given release is exactly the version tested with that release.

This change will not affect the primary GO ontology files, such as go.obo, go.owl, and go-basic.obo, which are already standalone files.

For flexible OWL integration of GO axioms with different versions of external ontologies, we also provide 'go-base.owl', which references external terms but does not import any content from external ontologies.

This change to standalone, merged releases for 'go-plus' (and undocumented internal GO files 'go-gaf' and 'go-lego') will first take place as a snapshot release, no earlier than April 22, 2019.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/16876#issuecomment-481829818, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGOZD3Ih-u0iQL1flmad7V5KjUlqkxks5vfjtagaJpZM4aZIdg .

balhoff commented 5 years ago

@alexsign FYI this (imports merged into go-plus) has been implemented and will appear in the next successful snapshot release.

alexsign commented 5 years ago

@balhoff Thanks for letting me know

goodb commented 4 years ago

@balhoff in trying to figure out what is going on with https://github.com/geneontology/pathways2GO/issues/88 I've been looking at go-plus and am a little confused. There are a lot of classes in there from other ontologies (CL, CHEBI, BFO, CARO, ENVO, MOD, NBO, OBI, NCBITaxon, PATO, PR, SO) that are not logically defined and do not have any label or text definition. E.g. CARO_0001001 neuron projection bundle, CHEBI:22868 bile salt, etc.

As the issue here discusses, GO-Plus does not import any of these ontologies so the product ends up being incomplete. I think? What am I missing?

Is there documentation anywhere on what go-plus is used to do downstream within the GO infrastructure? Its an important part of go-lego of course, where else is it used?

deustp01 commented 4 years ago

In case it's relevant, "bile salt" is a family of chemicals whose members get a lot of Reactome annotations, but we always refer to them by their individual identifiers and this grouping term CHEBI:22868 is not even an instance in our central database. And to the extent that we talk about development of the nervous system we do it without ever referring to CARO_0001001 neuron projection bundle or any other CARO term.

In fact CARO, BFO, ENVO, NBO, OBI, PATO, and PR are not reference ontologies that we ever refer to for any purpose. Terms from MOD, NCBITaxon, and SO are used in Reactome as are terms from all of the ontologies listed here.

goodb commented 4 years ago

@deustp01 many of these seem to appear in logical definitions that appear in go-plus, for example, go-plus includes a logical definition for the Uberon term 'bile' that includes 'subclass of (has part some 'bile salt')'