geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

Create ChEBI import with Rhea reaction chemicals #15926

Closed ukemi closed 5 years ago

ukemi commented 6 years ago

In order to get full reasoning based on the Rhea reaction participants, we will need to import the ChEBI identifiers for chemicals that are currently missing from the ChEBI import. A way to do this that is completely in alignment with our current pipeline is to add the ChEBI identifiers to go-ontology/src/ontology/imports/chebi_terms.txt and then run the makefile.

ukemi commented 6 years ago

If anyone is looking at this, its worth noting that the expanded CHEBI import did not include the General class axioms for the new additional CHEBI terms. Hence, e.g., diphosphoric acid (CHEBI_29888) is not equated with diphosphate(3-) (CHEBI_33019) in the merged ontology. @ukemi Member ukemi commented 2 hours ago

Isn't that part of the make_file? We can talk about it on Monday, but it makes sense to me to go ahead and run the ChEBI import with the additional terms we will need for the Rhea defs and if it all looks ok go ahead and merge that into master. It won't hurt anything to have the additional ChEBI classes I don't think. It's one more step we can do to get concrete progress along the way. @goodb Member goodb commented an hour ago Yes, I assume it is part of the make file and ought to work fine when built that way. I haven't got set up to do the complete build locally (todo list..) and thus merged the Robot-generated chebi extract manually into Protege - thus missing the generation of those axioms. It probably won't make a lot of difference to our discussion, just may end up missing a few more inferences.

ukemi commented 6 years ago

The way we usually add terms to the imports is by adding them here: https://github.com/geneontology/go-ontology/blob/master/src/ontology/imports/chebi_terms.txt and then having someone run the makefile in ROBOT. I'm not sure if there is an easier way, but if we simply add all the uris necessary for the Rhea logical defs to this file, it should work. Chris (or you or @balhoff might know an easier way).

cmungall commented 6 years ago

That should work

On Fri, Jun 29, 2018, 13:48 David Hill notifications@github.com wrote:

The way we usually add terms to the imports is by adding them here:

https://github.com/geneontology/go-ontology/blob/master/src/ontology/imports/chebi_terms.txt and then having someone run the makefile in ROBOT. I'm not sure if there is an easier way, but if we simply add all the uris necessary for the Rhea logical defs to this file, it should work. Chris (or you or @balhoff https://github.com/balhoff might know an easier way).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/15926#issuecomment-401470025, or mute the thread https://github.com/notifications/unsubscribe-auth/AADGOUXs9J7fu1W5nAPZb9SrZBsWLTJBks5uBpKKgaJpZM4Uyz9f .

goodb commented 6 years ago

(edit) Moving this comment over from other thread.

Added missing CHEBI terms (used in RHEA) to GOPlus import file see branch #issue-14984-chebi-import . See https://github.com/geneontology/go-ontology/blob/issue-14984-chebi-import/src/ontology/imports/chebi_terms.txt

cmungall commented 6 years ago

Let's check that the regenerated chebi_import.owl isn't overly large before committing to the github history on the go org - I don't expect it will be but we should check

On 2 Jul 2018, at 9:13, goodb wrote:

(edit) Moving this comment over from other thread.

Added missing RHEA terms to GOPlus import file see branch

issue-14984-chebi-import . See

https://github.com/geneontology/go-ontology/blob/issue-14984-chebi-import/src/ontology/imports/chebi_terms.txt

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/geneontology/go-ontology/issues/15926#issuecomment-401856360

ukemi commented 6 years ago

It looks like there are about 8000 new terms in @goodb 's branch of chebi_terms.txt. @cmungall, any idea if that will be too big? If not, can we create a pull request for those additions and @dougli1sqrd , @vanaukenk and I can work on making the new ChEBI import file? If we do it successfully on a local branch, we can fire up Protege with the new ChEBI imports to make sure we don't bog things down. Sound like a plan?

goodb commented 6 years ago

I have a question on protocol here. It seems a little weird that we have two ways of getting CHEBI terms into the ontology - via this file and the makefile and via just putting them into the main ontology file. As a general rule, it seems like it would be better to pick one way or the other to improve understanding and ease maintenance over the years??

On Tue, Jul 3, 2018 at 4:38 AM, David Hill notifications@github.com wrote:

It looks like there are about 8000 new terms in @goodb https://github.com/goodb 's branch of chebi_terms.txt. @cmungall https://github.com/cmungall, any idea if that will be too big? If not, can we create a pull request for those additions and @dougli1sqrd https://github.com/dougli1sqrd , @vanaukenk https://github.com/vanaukenk and I can work on making the new ChEBI import file? If we do it successfully on a local branch, we can fire up Protege with the new ChEBI imports to make sure we don't bog things down. Sound like a plan?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/15926#issuecomment-402124191, or mute the thread https://github.com/notifications/unsubscribe-auth/AB_U6jS184VUWDJdFsLCNoftXSqPuKUDks5uC1emgaJpZM4Uyz9f .

ukemi commented 6 years ago

The txt file is just a request for a term that we want imported into the ChEBI file that we use. All of the terms from external ontologies live in separate input files that are loaded into Protege and they only contain a subset of terms that we cross-reference in GO. We use the structure of the 'mini' external ontology for reasoning, but we assert the relationships between GO terms and the terms from the external ontologies while we are editing. We can only use them if they are already in the import file. This allows us to use a consistent version of an external ontology and we don't have to worry about the external ontology changing in a way that would break something. Ontology editors would love it if we could use any term from any of the external ontologies when we need them. TermGenie used to allow that, but our current workflow doesn't. When we need to use an external term that isn't in an import yet, we need to run the makefile to import the missing term. @cmungall, @dougli1sqrd or @balhoff can comment more thoroughly on why we need to run the extra step of creating the import files manually, but it does allow us to run checks on the file before we officially start to use them. I suspect with your large change in the txt file, the diff we get after running the makefile will be very large and we will simply trust it if it passes all the automated checks.

cmungall commented 6 years ago

what is the size of the owl?

currently

$ du -sh imports/chebi_import.owl 8.2M imports/chebi_import.owl

On 3 Jul 2018, at 4:38, David Hill wrote:

It looks like there are about 8000 new terms in @goodb 's branch of chebi_terms.txt. @cmungall, any idea if that will be too big? If not, can we create a pull request for those additions and @dougli1sqrd , @vanaukenk and I can work on making the new ChEBI import file? If we do it successfully on a local branch, we can fire up Protege with the new ChEBI imports to make sure we don't bog things down. Sound like a plan?

-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/geneontology/go-ontology/issues/15926#issuecomment-402124191

amorgat commented 6 years ago

Hope it's the right place to comment... but I just want to be sure that you're aware that there are three types of reaction participants in Rhea:

1) A small molecule is linked to one ChEBI compound. Its accession is the ChEBI accession. Example: CHEBI:57844 in Rhea reaction RHEA:10048.

2) A Rhea polymer is linked to one ChEBI compound (the underlying polymer), it has an accession of the form POLYMER:xxx (with xxx being a numeric identifier). A Rhea polymer can exist with several different polymerization indexes, i.e several Rhea polymers may be linked to the same ChEBI underlying polymer. Example: POLYMER:9584 [(1->4)-alpha-D-glucosyl]n, POLYMER:9587 [(1->4)-alpha-D-glucosyl]n+1 and POLYMER:9586 [(1->4)-alpha-D-glucosyl]n-1) which is different from a ChEBI polymer which always exists with polymerization index n Example of Rhea reaction involving Rhea polymers: RHEA:24572.

3) A Rhea generic is a macromolecule (protein, nucleic acid,..) that is not represented as is in ChEBI, it has an accession on the form GENERIC:xxx (with xxx a numeric identifier). Such macromolecules are modeled with the residues and/or functional groups that are directly involved in the chemical transformations. The residues and/or functional groups are linked to ChEBI compounds. Rhea generics maybe linked to several ChEBI entities. A ChEBI entity may be linked to several Rhea generic. Example: GENERIC:9685 "holo-[ACP]" is modeled by CHEBI:64479 "O-(pantetheine-4ʼ-phosphoryl)-L-serine residue" CHEBI:64479 is also the residue for several other Rhea generics:

goodb commented 6 years ago

Thanks for the comment @amorgat . Definitely relevant and has already come up on this other thread: https://github.com/geneontology/go-ontology/issues/15930#issuecomment-399322630 . We need to make a decision about what level to model the reaction participants for these logical definitions. If we stick to only using CHEBI terms, are the definitions (as for example in that thread) that involve generics and polymers logically incorrect or just less specific than they could be? e.g., is that equivalency inference incorrect - as I think I read @hdrabkin 's comment to say? If it is incorrect, than we can either eliminate equations that involve generics and polymers from the import or we will need to expand the substance definitions beyond those represented in CHEBI. Thoughts?

ukemi commented 6 years ago

Progress on this thread specifically with respect to ChEBI imports. I used @goodb's branch to run the import makefile in docker. The new import file in my branch is 33M. I have not pushed it to origin. @cmungall commented that this is a bit large. However, I have sent the file to Ben and on Monday, @hdrabkin, @deustp01 and I will use my local branch to sanity check that it seemed to have built properly. From there we need to come up with a solution about how to handle a file that is this large.

ukemi commented 6 years ago

Spot checking the new chebi import in my local branch:

ADP metabolic process is classified by the reasoner as a purine ribonucleoside monophosphate metabolic process. This seems like an incorrect classification by ChEBI. This is also in the live ontology.

Same type of issue with CDP. It is listed as a ribonucleoside. This is also in the live ontology (GO:0046704).

Otherwise, the live ontology mimics the new import.

@goodb When checking for new chemicals in the .txt file on your request branch, we find chemical such as pyranoside (CHEBI:75504) and Ala-Gly zwitterion (CHEBI:73786) that don't seem to be used in any of the Rhea reactions that are in newMFsfromRhea.ttl. When we look up the RHEA reactions that use these chemicals in GO we do not find those RHEA identifiers as xrefs. It is possible that the requested chemicals are more than we need?

In some cases things look ok, CHEBI_58127 is in the new file and crossreferences GO:0047917 (RHEA_15052) and GO:0047344 (RHEA_10711).

goodb commented 6 years ago

@ukemi the new CHEBI term request covers all of the CHEBIs used anywhere in RHEA. We could reduce that for the moment, but the thought (from @cmungall originally I believe) was to just bring them all in at once rather then doing so piecemeal.

The newMFsfromRhea ontology (and its several more advanced descendants) only contains terms with existing manually specified xrefs from GO to RHEA. This set can be grown as we've discussed, for example, by using EC numbers shared by GO terms and RHEA reactions to find additional matches.

My take here is to go ahead with the full import. Its not such an outlandish size that the software team can not find a solution for dealing with it in github and it will smooth development over the coming months by reducing the need for new imports to be processed. But, if the consensus disagrees, let me know and I can make a term set that only contains chebis from currently xref'd reactions.

hdrabkin commented 6 years ago

I created a chebi ticket https://github.com/ebi-chebi/ChEBI/issues/3479 which is is copied inin https://github.com/geneontology/go-ontology/issues/16036

goodb commented 6 years ago

@cmungall it looks like this issue got summer-stuck. If I recall correctly, the concern was the size of the import but that we tested in github and it seemed to be fine. What do you think? Would like to clear the issue one way or another. (By doing the full import as proposed or cutting it down to the currently active subset that affects the proposed logical defs).

ukemi commented 6 years ago

Since the above ticket ebi-chebi/ChEBI#3479 would resolve a large number of logical issues, it would be nice if this fix were part of the new import build.

goodb commented 6 years ago

@ukemi I assume that is still on chebi's plate right? (I guess specifically on @G-Owen 's plate based on the issue assignment.)

goodb commented 5 years ago

@ukemi chebi has closed that ticket. Is it time to go ahead and merge the expanded set of chebi into Master so we have what we need to finish work leading up to Geneva ?

ukemi commented 5 years ago

I think so as long as the file size isn't an issue. I'd like to get confirmation of that from @cmungall or @balhoff. The we should add the required chemicals to https://github.com/geneontology/go-ontology/blob/master/src/ontology/imports/chebi_terms.txt. I would like @hdrabkin to build the import so he has experience. If we can update the txt file by Monday's meeting, we can have him run the build using docker on the call. I suspect the diff is going to be too big to do anything useful with. See one, do one, teach one.

goodb commented 5 years ago

The file was updated a while ago just made an official PR for that update https://github.com/geneontology/go-ontology/pull/16601

ukemi commented 5 years ago

I believe this was completed this week.