Open marcfeuermann opened 2 years ago
This is correct, currently only the subset of CHEBI used for logical definitions in the ontology is included on the ontology of referenced entities (go-lego)
If we want to extend this then the steps for someone (not you!) are:
We should go ahead and do this if this is useful to. Some things we want to keep an eye on:
My comments shouldn't be considered blockers, but we shouldn't forget about this. We may want to consider a general project in go for our various chebi issues
On Fri, May 6, 2022 at 2:32 AM Marc feuermann @.***> wrote:
Hello, By creating models for secondary metabolites biosynthesis I realized that part of the ChEBI chemicals are not available with the tool. As an example, within the P. expansum patulin biosynthesis pathway, I cannot get:
CHEBI:5325 gentisyl alcohol CHEBI:145109 (+)-isoepoxydon CHEBI:145112 (E)-ascladiol CHEBI:145110 phyllostine CHEBI:145111 isopatulin
This is probably not limited to this pathway and many ChEBI chemicals are probably not available yet for creating models Thanks a lot. Best regards, Marc.
— Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/96, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOPG44MTI3NQ6IRUTJDVITRLVANCNFSM5VHS3NAA . You are receiving this because you are subscribed to this thread.Message ID: @.***>
If we want to extend this
Unless the larger import of ChEBI into go-lego has some major technical cost (memory, performance, ...), my guess from the Reactome experience is that loading everything beats trying to anticipate user needs. We never succeeded at the anticipation part.
A local fix is that our data model allows reference instances to be added to our central database one at a time, so we have a series of wizards that enable a curator to supply a valid ID (ChEBI, UniProt, etc.), the wizard fetches the relevant information and adds it to the local project, and when the project is saved the new reference instance is added to the central repository and is accessible to all users. No clue here whether the owl structure and the rest of Noctua is compatible with this.
Would it make sense to load the CheBI7.3 set ?
https://ftp.expasy.org/databases/rhea/tsv/chebi%5FpH7%5F3%5Fmapping.tsv
That's 90k terms though, it seems like a lot?
I think supplementing imports/chebi_import with 7.3terms plus their is-a ancestors is a good strategy
think supplementing imports/chebi_import with 7.3terms
An issue possibly to discuss with ChEBI is that the form of an ionizable molecule that is prevalent at pH 7;3 is not always tagged as such in ChEBI. Can these tags somehow magically be applied to all appropriate ChEBI instances?
@cmungall can this be done in the near future? Or not? Is this a lot of work? This is a blocker for Marc's biosynthesis models.
Alternatively, could we manually upload all the RHEA ph7.3 in the imports/chebi file?
Thanks, Pascale
An issue possibly to discuss with ChEBI is that the form of an ionizable molecule that is prevalent at pH 7;3 is not always tagged as such in ChEBI. Can these tags somehow magically be applied to all appropriate ChEBI instances?
It is Rhea that makes this file, and some of these are computed. E.g. all the entries marc provides are marked computational
@balhoff how much work would this be?
We want to take all entries in chebi_pH7_3_mapping.tsv plus their is-a ancestors only and supplement the go-lego file with this.
This should be in the next go-lego snapshot.
Actually that will be held up by ~https://github.com/geneontology/go-ontology/pull/23567~ https://github.com/geneontology/go-ontology/issues/23568.
Is the blocking issue resolved? (I thought it was #23568)
Thanks, Pascale
Thanks, I fixed the link to the issue. I just merged my PR, and the additional CHEBI terms should be in the next go-lego snapshot. Thanks for your help!
it's not a technical burden to load it all in advance, I am more thinking of the maintenance burden, e.g if we later decide we need to normalize to protonation states. It would be good if we had a half page of curator guidance on picking chebi terms
On Mon, May 9, 2022 at 7:27 AM deustp01 @.***> wrote:
If we want to extend this
Unless the larger import of ChEBI into go-lego has some major technical cost (memory, performance, ...), my guess from the Reactome experience is that loading everything beats trying to anticipate user needs. We never succeeded at the anticipation part.
A local fix is that our data model allows reference instances to be added to our central database one at a time, so we have a series of wizards that enable a curator to supply a valid ID (ChEBI, UniProt, etc.), the wizard fetches the relevant information and adds it to the local project, and when the project is saved to new reference instance is added to the central repository and is accessible to all users. No clue here whether the owl structure and the rest of Noctua is compatible with this.
— Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/96#issuecomment-1121176124, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOJG4R5FT4A6I6PWNRLVJEOENANCNFSM5VHS3NAA . You are receiving this because you commented.Message ID: @.***>
curator guidance on picking chebi terms
Not sure you want this. Coming up with filters to get rid of ones irrelevant to biology seems hard and risky. Why not dump the whole collection of ChEBI chemicals into Noctua (with arrangements for periodic updates to capture changes in ChEBI). Curators, just as they do now (and perhaps with some explicit guidance pointing to Rhea as a resource for guidance on these issues), will need to figure out which ChEBI term represents the correct charge state and stereochemistry for their particular organism and environment.
I think we do want to eliminate choices over protonation state and just select ph7.3 form
As far as I can tell, pH 7.3 charge states are sometimes, but not always, noted in the ChEBI entries so this will require some clean-up, either at ChEBI (always eager to get someone else to do the work) or magically, at import-to-Noctua time.
Maybe, for now, import everything because that already supports better, easier curation, and work on figuring out how to prune the list.
conservative approach: exclude if it is a transitive proper conjugate base/acid of an annotated ph7.3
On Tue, Oct 11, 2022 at 6:56 AM deustp01 @.***> wrote:
As far as I can tell, pH 7.3 charge states are sometimes, but not always, noted in the ChEBI entries so this will require some clean-up, either at ChEBI (always eager to get someone else to do the work) or magically, at import-to-Noctua time.
— Reply to this email directly, view it on GitHub https://github.com/geneontology/neo/issues/96#issuecomment-1274729827, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOIOYHMO5G2ARM6QGVLWCVWYZANCNFSM5VHS3NAA . You are receiving this because you were mentioned.Message ID: @.***>
conservative approach
This should not exclude anything we want, so it's a good start.
Hello, By creating models for secondary metabolites biosynthesis I realized that part of the ChEBI chemicals are not available with the tool. As an example, within the P. expansum patulin biosynthesis pathway, I cannot get:
CHEBI:5325 gentisyl alcohol CHEBI:145109 (+)-isoepoxydon CHEBI:145112 (E)-ascladiol CHEBI:145110 phyllostine CHEBI:145111 isopatulin
This is probably not limited to this pathway and many ChEBI chemicals are probably not available yet for creating models Thanks a lot. Best regards, Marc.