Closed sylvainpoux closed 3 years ago
Looks like this might require some new ChEBI terms, so handing off to @hdrabkin.
Hi David,
Rhea reactions are public and all ChEBI accessions are described in the reactions:
https://www.rhea-db.org/reaction?id=59884 https://www.rhea-db.org/reaction?id=59888 https://www.rhea-db.org/reaction?id=59896 https://www.rhea-db.org/reaction?id=59892
Thanks
Sylvain
Thanks Sylvain. I'll try to get these in today.
I don't see the final L-beta-ethynylserine (compound 7,figure 2, PMID:30867596), but will add the term so you can annotate.
Adding terms in #17788.
@hdrabkin can you investigate adding L-beta-ethynylserine to ChEBI? You might get to draw a structure. Once that is in, can you add it and L-propargylglycine (CHEBI:43797) to the import files and create logical defs for L-beta-ethynylserine biosynthetic process and L-propargylglycine biosynthetic process after we are able to rebuild the ChEBI import files?
Also note that I did not add the antibiotic parent you requested. We will only add these if the role is universal/rigid. If you think this to be the case and these compound ever only have the role of an antibiotic, I will add them. pinging @krchristie
CHEBI:144729, L-beta-ethynylserine, has been submitted to CHEBI
Looking at the two papers that @sylvainpoux cited, the 1986 paper discovering beta-ethynylserine calls it an antibiotic:
from Sanada M, et al. 1986. beta-Ethynylserine, an antimetabolite of L-threonine, from Streptomyces cattleya. J Antibiot (Tokyo). 39(2):304-5. PMID:3082841.
This antibiotic was also an antimetabolite of L-threonine and was identified as beta-ethynylserine from its physicochemical properties.
However, the 2019 paper describing discovery of the pathway refers to antibiotics only in the context of selecting for E. coli that have the appropriate plasmids in the process of putting this biosynthetic pathway into E. coli.
Marchand JA, et al. 2019. Discovery of a pathway for terminal-alkyne amino acid biosynthesis. Nature. 567(7748):420-424. PMID:30867596.
To make sure I don't forget about this, I'm going to add this to the ChEBI roles project so that I remember to take a look at this once we've come up with a plan for how we want to handle these types of roles that may be context/organism specific, but not universal.
@hdrabkin - Once you're done with your part of this ticket, please leave it open and reassign it to me.
It's a chicken and egg situation. @hdrabkin can't finish his part until we can update the ChEBI import, which currently has lots of inferences we don't want based on roles. I think the general roles need to be addressed first.
@krchristie will do. @ukemi , did you just add the process term only?
No. I added them all.
I've added myself to the assignment list then, and left @hdrabkin. He can unassign himself if/when he feels that his part is complete.
CHEBI:144729 is now public. Not clear if I can do a CHEBI import
@hdrabkin you can just add it to https://github.com/geneontology/go-ontology/blob/master/src/ontology/imports/chebi_terms.txt and the import will be regenerated soon.
@ukemi told me we use a special pre-roles copy to merge new stuff into, so I'll add to the txt file
I don't see equivalence axioms on any of the pre-existing related MF terms so I didn't try to add equivalence axioms to any of the new MF terms.
Adding equivalence axioms to the BP terms utilizing the newly created ChEBI terms for
Term: L-beta-ethynylserine biosynthetic process child of: GO:0043453 alkyne biosynthetic process
Term: L-propargylglycine biosynthetic process child of: GO:0043453 alkyne biosynthetic process
For now, I left the alkyne biosynthetic process
parentage directly asserted because the reasoner does not generate this.
Looking at ChEBI, they have not asserted that either L-beta-ethynylserine
or L-propargylglycine
is an alkyne. In ChEBI, these are both under the parentage terminal acetylenic compound
while the ChEBI term alkyne
is under the term acetylenes
, which is a sibling term of terminal acetylenic compound
. Here's the structure in ChEBI since it's easier to see it than decipher it from verbiage.
-- acetylenic compound
---acetylenes
---- acyclic acetylene
----- alkyne
----terminal acetylenic compound
----- L-beta-ethynylserine
----- L-propargylglycine
Anyway, we might want to reconsider if the placement of the new biosynthetic terms for these two compounds belong under the GO term for alkyne biosynthetic process
if ChEBI doesn't place them under alkyne
. Looking at ChEBI, it looks like it might be better to create a more general term for acetylenic compound biosynthesis
, which would be a parent term above alkyne metabolic process
.
thoughts? @hdrabkin @sylvainpoux
Are all of these RHEAs public?
Never mind just checked; yes they are so can use immediately.
While the source paper does talk about production of a terminal alkyne
:
Marchand JA, Neugebauer ME, Ing MC, Lin CI, Pelton JG, Chang MCY. Discovery of a pathway for terminal-alkyne amino acid biosynthesis. Nature. 2019;567(7748):420-424. doi:10.1038/s41586-019-1020-y PMID: 30867596
Here we report the discovery and characterization of a unique pathway to produce a terminal alkyne-containing amino acid in the bacterium Streptomyces cattleya.
ChEBI does not use the phrase terminal alkyne
. In ChEBI, both L-beta-ethynylserine
& L-propargylglycine
classify under terminal acetylenic compound
. If I create a term in GO for cellular terminal acetylenic compound biosynthetic process
and remove direct SubClass assertions, I get this classification where neither L-beta-ethynylserine
& L-propargylglycine
classify under alkyne biosynthetic process
It seems best to go with ChEBI's classification, so I'm going to proceed with this.
I also seem to recall that we should be using the species prevalent at pH 7.3, which would be the zwitterions for both L-beta-ethynylserine
& L-propargylglycine
. Let me know if I've misunderstood this, or if it doesn't apply to a pathway only known to occur in bacterium as opposed to vertebrates.
@cmungall @pgaudet - let me know if you think is OK or not.
I was going to just do this, but the build has failed for a non-content reason, so I'm going to tag this to talk about it on Monday's call if I haven't figured out how to restart the checks by then.
7/27/2020 - Discussion at Ontology Editors cal
However, in the call, you guys reinforced my initial thought that alkyne
is "anything with a triple bond", and I checked some more stuff. I have some questions about what ChEBI is doing, so I am submitting a ticket to them.
Then, in rechecking my branch, it isn't working like I thought it was, and I also checked the classification of L-beta-ethynylserine
versus L-beta-ethynylserine zwitterion
on ChEBI's website. Basically, I think we may want to discuss the implications switching to the 7.3 terms may have for automatic classification of compounds since it doesn't look to me like the zwitterions of these two compounds classify chemically where the non-zwitterion terms do. @balhoff @cmungall
Here are pics of the graphs from ChEBI. The compound of interest is the very bottom one in each graph. Automatic classification using the zwitterion terms looks like it will classify terms to organic compound, but not to any kind of acetylenic compound (what ChEBI is using for anything containing a carbon carbon triple bond).
Parentage of L-beta-ethynylserine
Parentage of L-beta-ethynylserine zwitterion
Hi Karen
the problem of inconsistent hierarchies in ChEBI between charge states is a common one.
But could you incorporate the ChEBI charge state mapping that Rhea provides into either
i) the GO or ii) the reasoning process
to deal with this?
This file - chebi_pH7_3_mapping.tsv - at https://www.rhea-db.org/download maps "other" charge states to that used in Rhea.
So for
CHEBI:144729 - L-β-ethynylserine zwitterion (used in Rhea) CHEBI:144833 - L-β-ethynylserine (the one with a better developed hierarchy)
the file gives something like this:
CHEBI CHEBI_PH7_3 ORIGIN
144833 144729 computation
i.e. a computed relation between the two.
For reasoning you could infer relations using the hierarchy of both, but this could have some issues (like where one of the parents is also a specific charge form too). Perhaps another alternative could be to add all charge states from ChEBI as xrefs in the GO? So you would add ChEBI:144833 and ChEBI:144729 as xrefs for this pathway. Again for this one could use the chebi_pH7_3_mapping.tsv to help; if one GO xref has been curated for a GO BP, then additional GO xrefs could be added automatically from such a file.
Pinging @amorgat
All the best, Alan
Hi Karen, all I share with you our working document on how to index ChEBI "is a" relationships. https://docs.google.com/presentation/d/1xLa1Z3EVFGuncWN4BUllymY6rVdpfYJbtEumyNABhkU/edit#slide=id.p
Do not hesitate to contact us if something is not clear or if you need additional information. All the best, Anne
Thanks @alanbridge and @amorgat - that's really helpful!!
For now, I'm going to commit a version
cellular terminal acetylenic compund biosynthesis
as a parent for these two biosynthetic termsI will also open a ticket about alkyne biosynthetic process
since GO is using a general definition of alkyne but referencing the ChEBI term for alkyne
which is restricted to simple unsubstituted straight chain alkynes.
I also checked the classification of L-beta-ethynylserine
versus L-beta-ethynylserine zwitterion
on ChEBI's website. Basically, I think we may want to discuss the implications switching to the 7.3 terms may have for automatic classification of compounds within GO since it doesn't look to me like the zwitterions of these two compounds classify chemically where the non-zwitterion terms do.
Two things about thse ChEBI entries confuse me. First, ChEBI relates (incoming / outgoing) these two compounds by saying that each is the tautomer of the other, but does not have an incoming / outgoing conjugate acid - conjugate base relationship. I would have expected the opposite - conjugate acid / base but not tautomer. What am I missing in the chemistry here? And related to this, Karen's question: why should the zwitterion form lack the non-proteinogenic amino acid parent that the non-zwitterion form has? (Looking a bit more, I see the same difference for L-leucine: it has all sorts of chemical parents that its zwitterion lacks. @hdrabkin ? Second, I've noticed that ChEBI sometimes does not identify the form of an ionizable compound that is predominant at pH 7.3, so an issue may be whether GO ontology developers and curators may sometimes need to figure this our for an individual compound and if so, can that figuring be confirmed by Rhea and passed on to ChEBI so they can fill in the missing bit in their annotation for the chemical.
Hi Peter, let me try to answer to your questions.
Concerning the case of L-β-ethynylserine
CHEBI:144729 L-β-ethynylserine zwitterion InChiKey: RBWXRFBKVDBXEG-DMTCNVIQSA-N
CHEBI:144833 L-β-ethynylserine InChiKey: RBWXRFBKVDBXEG-DMTCNVIQSA-N
You will notice that both ChEBI entries have the same InChiKey. The 2 structures are tautomers as they only differ by the position of one proton (the proton of the carboxylic acid is on the amino group in the zwitterionic form). CHEBI:144729, as a zwitterion, is both a conjugated acid and a conjugated base of CHEBI:144833, so it makes no sense using "is conjugate acid/base of" relationships, that's why ChEBI curators use the "is tautomer of" relationship instead.
As discussed during the last ChEBI workshop @EBI in May 2019, Adnan (the ChEBI curator) informed us that their guideline is to annotate most of the relationships on the fully hydrogenated form only. ChEBI entries describing the different protonation states of a molecule are linked by "is conjugate base of", "is conjugate acid of" and "is tautomer of" relationships. The reality is that some relationships may be missing and retrieving all species maybe tricky when they have several ionizable groups (need of a recursive procedure). That's why instead of these 3 relationships, the SIB developers use an additional relationship "has_major_microspecies_at_pH_7_3" which is available in the chebi.owl file (published by Rhea) but not used/displayed in the ChEBI public web site. This relationship is computed using the Marvin plugin provided by Chemaxon (https://docs.chemaxon.com/display/docs/Major+Microspecies+Plugin)
For your second point: "Second, I've noticed that ChEBI sometimes does not identify the form of an ionizable compound that is predominant at pH 7.3, so an issue may be whether GO ontology developers and curators may sometimes need to figure this our for an individual compound and if so, can that figuring be confirmed by Rhea and passed on to ChEBI so they can fill in the missing bit in their annotation for the chemical" It's true that sometimes the major species at pH 7.3 is missing. CHEBI is aware of the issue and one of the outcomes of the ChEBI workshop was Project 1: Unify alternative forms of chemical entities with collaborative effort. At the moment, we (SIB curators) submit the missing entries when needed for Rhea or UniProt. I'm not sure I understand in which cases you would need a ChEBI compound that is not used in Rhea? Is it in the definition of GO:BP term (biosynthesis of XX / degradation of YY) where the reactions are not yet clearly defined?
@balhoff - I was thinking I would split off the 7.3 vs full protonated parentage issue issue into a separate ticket since it goes beyond the scope of the original ticket that only asked for creation of a couple terms. Is there an existing ticket where it would be good to add these 7.3 related issues, or should I just create a new ticket?
I'm not sure I understand in which cases you would need a ChEBI compound that is not used in Rhea? Is it in the definition of GO:BP term (biosynthesis of XX / degradation of YY) where the reactions are not yet clearly defined?
I wasn't clear - I don't see any cases here where this is needed. For the rest, I was missing some history as well as a good understanding of the acid-base / tautomer relationship. Apologies for going over old ground.
the SIB developers use an additional relationship "has_major_microspecies_at_pH_7_3" which is available in the chebi.owl file (published by Rhea) but not used/displayed in the ChEBI public web site.
@amorgat are those relations in that chebi file identical to the information provided in chebi_pH7_3_mapping.tsv
from the Rhea download page?
@balhoff, chebi_pH7_3_mapping.tsv
tells you if a ChEBI entity is the major species at pH 7.3 (CHEBI == CHEBI_PH7_3) or what is the corresponding CHEBI_PH7_3 if it's not the case`
CHEBI CHEBI_PH7_3 ORIGIN
3 3 computation
...
15361 15361 computation
32816 15361 computation
...
In the chebi.owl
, the relationships "has_major_microspecies_at_pH_7_3" are only provided between a ChEBI entry and its major species at pH 7.3.
In the previous example, only CHEBI_32816 has this relationship whereas CHEBI_3 and CHEBI_15361 haven't any "has_major_microspecies_at_pH_7_3" relationships.
To be discussed with Jerven Bolleman if you would need it.
@amorgat thank you, I think I understand. From my perspective I think I have everything I need in the TSV file, i.e., there isn't additional information in the OWL file.
@krchristie in the branch for #20093 (bio-chebi based on pH 7.3 mappings), these are inferable:
Hi GO, we would need a bunch of new terms for a new pathway
Thanks
Sylvain
Term: L-beta-ethynylserine biosynthetic process
Definition: The chemical reactions and pathways resulting in the formation of L-beta-ethynylserine, an antibiotic produced by Streptomyces bacteria.
Category: Process
child of: GO:0017000 antibiotic metabolic process GO:0008652 cellular amino acid biosynthetic process GO:0043453 alkyne biosynthetic process
Xref PMID:30867596 PMID:3082841
Term: L-propargylglycine biosynthetic process
Definition: The chemical reactions and pathways resulting in the formation of L-propargylglycine (Pra), an antibiotic produced by Streptomyces bacteria.
Category: Process
child of: GO:0017000 antibiotic metabolic process GO:0008652 cellular amino acid biosynthetic process GO:0043453 alkyne biosynthetic process
Xref PMID:30867596
Term: L-propargylglycine synthase activity
Definition: Catalysis of the reaction: L-2-amino-4-chloropent-4-enoate = chloride + H(+) + L-propargylglycine.
Category: Function
child of: carbon-halide lyase activity ; GO:0016848
Xref PMID:30867596 Rhea:59892
Term: L-propargylglycine--L-glutamate ligase activity
Definition: Catalysis of the reaction: ATP + L-glutamate + L-propargylglycine = ADP + H(+) + L-gamma-glutamyl-L-propargylglycine + phosphate.
Category: Function
child of: acid-amino acid ligase activity ; GO:0016881
PMID:30867596 Rhea:59896
Term: 4-chloro-allylglycine synthase activity
Definition: Catalysis of the reaction: 4-chloro-L-lysine + AH2 + O2 = A + formaldehyde + H2O + L-2-amino-4-chloropent-4-enoate + NH4(+).
Category: Function
child of: oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen ; GO:0016705
Xref PMID:30867596 Rhea:59888
Term: L-lysine 4-chlorinase activity
Definition: Catalysis of the reaction: 2-oxoglutarate + chloride + H(+) + L-lysine + O2 = 4-chloro-L-lysine + CO2 + H2O + succinate.
Category: Function
child of: oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, with 2-oxoglutarate as one donor, and the other dehydrogenated ; GO:0050498
Xref PMID:30867596 Rhea:59884
Term: L-gamma-glutamyl-L-propargylglycine hydroxylase activity
Definition: Catalysis of the reaction: 2-oxoglutarate + L-gamma-glutamyl-L-propargylglycine + O2 = CO2 + L-gamma-glutamyl-(3R)-L-beta-ethynylserine + succinate.
Category: Function
child of: 2-oxoglutarate-dependent dioxygenase activity ; GO:0016706
Xref PMID:30867596 Rhea:59900