geneontology / neo

noctua entity ontology
9 stars 2 forks source link

Many ChEBI chemicals not available for metabolic models #96

Open marcfeuermann opened 2 years ago

marcfeuermann commented 2 years ago

Hello, By creating models for secondary metabolites biosynthesis I realized that part of the ChEBI chemicals are not available with the tool. As an example, within the P. expansum patulin biosynthesis pathway, I cannot get:

CHEBI:5325 gentisyl alcohol CHEBI:145109 (+)-isoepoxydon CHEBI:145112 (E)-ascladiol CHEBI:145110 phyllostine CHEBI:145111 isopatulin

This is probably not limited to this pathway and many ChEBI chemicals are probably not available yet for creating models Thanks a lot. Best regards, Marc.

cmungall commented 2 years ago

This is correct, currently only the subset of CHEBI used for logical definitions in the ontology is included on the ontology of referenced entities (go-lego)

If we want to extend this then the steps for someone (not you!) are:

We should go ahead and do this if this is useful to. Some things we want to keep an eye on:

My comments shouldn't be considered blockers, but we shouldn't forget about this. We may want to consider a general project in go for our various chebi issues

On Fri, May 6, 2022 at 2:32 AM Marc feuermann @.***> wrote:

Hello, By creating models for secondary metabolites biosynthesis I realized that part of the ChEBI chemicals are not available with the tool. As an example, within the P. expansum patulin biosynthesis pathway, I cannot get:

CHEBI:5325 gentisyl alcohol CHEBI:145109 (+)-isoepoxydon CHEBI:145112 (E)-ascladiol CHEBI:145110 phyllostine CHEBI:145111 isopatulin

This is probably not limited to this pathway and many ChEBI chemicals are probably not available yet for creating models Thanks a lot. Best regards, Marc.

— Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/96, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOPG44MTI3NQ6IRUTJDVITRLVANCNFSM5VHS3NAA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

deustp01 commented 2 years ago

If we want to extend this

Unless the larger import of ChEBI into go-lego has some major technical cost (memory, performance, ...), my guess from the Reactome experience is that loading everything beats trying to anticipate user needs. We never succeeded at the anticipation part.

A local fix is that our data model allows reference instances to be added to our central database one at a time, so we have a series of wizards that enable a curator to supply a valid ID (ChEBI, UniProt, etc.), the wizard fetches the relevant information and adds it to the local project, and when the project is saved the new reference instance is added to the central repository and is accessible to all users. No clue here whether the owl structure and the rest of Noctua is compatible with this.

pgaudet commented 2 years ago

Would it make sense to load the CheBI7.3 set ?

https://ftp.expasy.org/databases/rhea/tsv/chebi%5FpH7%5F3%5Fmapping.tsv

That's 90k terms though, it seems like a lot?

cmungall commented 2 years ago

I think supplementing imports/chebi_import with 7.3terms plus their is-a ancestors is a good strategy

deustp01 commented 2 years ago

think supplementing imports/chebi_import with 7.3terms

An issue possibly to discuss with ChEBI is that the form of an ionizable molecule that is prevalent at pH 7;3 is not always tagged as such in ChEBI. Can these tags somehow magically be applied to all appropriate ChEBI instances?

pgaudet commented 2 years ago

@cmungall can this be done in the near future? Or not? Is this a lot of work? This is a blocker for Marc's biosynthesis models.

Alternatively, could we manually upload all the RHEA ph7.3 in the imports/chebi file?

Thanks, Pascale

cmungall commented 2 years ago

An issue possibly to discuss with ChEBI is that the form of an ionizable molecule that is prevalent at pH 7;3 is not always tagged as such in ChEBI. Can these tags somehow magically be applied to all appropriate ChEBI instances?

It is Rhea that makes this file, and some of these are computed. E.g. all the entries marc provides are marked computational

@balhoff how much work would this be?

We want to take all entries in chebi_pH7_3_mapping.tsv plus their is-a ancestors only and supplement the go-lego file with this.

balhoff commented 2 years ago

This should be in the next go-lego snapshot.

balhoff commented 2 years ago

Actually that will be held up by ~https://github.com/geneontology/go-ontology/pull/23567~ https://github.com/geneontology/go-ontology/issues/23568.

pgaudet commented 2 years ago

Is the blocking issue resolved? (I thought it was #23568)

Thanks, Pascale

balhoff commented 2 years ago

Thanks, I fixed the link to the issue. I just merged my PR, and the additional CHEBI terms should be in the next go-lego snapshot. Thanks for your help!

cmungall commented 2 years ago

it's not a technical burden to load it all in advance, I am more thinking of the maintenance burden, e.g if we later decide we need to normalize to protonation states. It would be good if we had a half page of curator guidance on picking chebi terms

On Mon, May 9, 2022 at 7:27 AM deustp01 @.***> wrote:

If we want to extend this

Unless the larger import of ChEBI into go-lego has some major technical cost (memory, performance, ...), my guess from the Reactome experience is that loading everything beats trying to anticipate user needs. We never succeeded at the anticipation part.

A local fix is that our data model allows reference instances to be added to our central database one at a time, so we have a series of wizards that enable a curator to supply a valid ID (ChEBI, UniProt, etc.), the wizard fetches the relevant information and adds it to the local project, and when the project is saved to new reference instance is added to the central repository and is accessible to all users. No clue here whether the owl structure and the rest of Noctua is compatible with this.

— Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/96#issuecomment-1121176124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOJG4R5FT4A6I6PWNRLVJEOENANCNFSM5VHS3NAA . You are receiving this because you commented.Message ID: @.***>

deustp01 commented 2 years ago

curator guidance on picking chebi terms

Not sure you want this. Coming up with filters to get rid of ones irrelevant to biology seems hard and risky. Why not dump the whole collection of ChEBI chemicals into Noctua (with arrangements for periodic updates to capture changes in ChEBI). Curators, just as they do now (and perhaps with some explicit guidance pointing to Rhea as a resource for guidance on these issues), will need to figure out which ChEBI term represents the correct charge state and stereochemistry for their particular organism and environment.

cmungall commented 2 years ago

I think we do want to eliminate choices over protonation state and just select ph7.3 form

deustp01 commented 2 years ago

As far as I can tell, pH 7.3 charge states are sometimes, but not always, noted in the ChEBI entries so this will require some clean-up, either at ChEBI (always eager to get someone else to do the work) or magically, at import-to-Noctua time.

Maybe, for now, import everything because that already supports better, easier curation, and work on figuring out how to prune the list.

cmungall commented 2 years ago

conservative approach: exclude if it is a transitive proper conjugate base/acid of an annotated ph7.3

On Tue, Oct 11, 2022 at 6:56 AM deustp01 @.***> wrote:

As far as I can tell, pH 7.3 charge states are sometimes, but not always, noted in the ChEBI entries so this will require some clean-up, either at ChEBI (always eager to get someone else to do the work) or magically, at import-to-Noctua time.

— Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/96#issuecomment-1274729827, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOIOYHMO5G2ARM6QGVLWCVWYZANCNFSM5VHS3NAA . You are receiving this because you were mentioned.Message ID: @.***>

deustp01 commented 2 years ago

conservative approach

This should not exclude anything we want, so it's a good start.