geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

L-beta-ethynylserine pathway #17763

Closed sylvainpoux closed 3 years ago

sylvainpoux commented 5 years ago

Hi GO, we would need a bunch of new terms for a new pathway

Thanks

Sylvain

Term: L-beta-ethynylserine biosynthetic process

Definition: The chemical reactions and pathways resulting in the formation of L-beta-ethynylserine, an antibiotic produced by Streptomyces bacteria.

Category: Process

child of: GO:0017000 antibiotic metabolic process GO:0008652 cellular amino acid biosynthetic process GO:0043453 alkyne biosynthetic process

Xref PMID:30867596 PMID:3082841

Term: L-propargylglycine biosynthetic process

Definition: The chemical reactions and pathways resulting in the formation of L-propargylglycine (Pra), an antibiotic produced by Streptomyces bacteria.

Category: Process

child of: GO:0017000 antibiotic metabolic process GO:0008652 cellular amino acid biosynthetic process GO:0043453 alkyne biosynthetic process

Xref PMID:30867596

Term: L-propargylglycine synthase activity

Definition: Catalysis of the reaction: L-2-amino-4-chloropent-4-enoate = chloride + H(+) + L-propargylglycine.

Category: Function

child of: carbon-halide lyase activity ; GO:0016848

Xref PMID:30867596 Rhea:59892

Term: L-propargylglycine--L-glutamate ligase activity

Definition: Catalysis of the reaction: ATP + L-glutamate + L-propargylglycine = ADP + H(+) + L-gamma-glutamyl-L-propargylglycine + phosphate.

Category: Function

child of: acid-amino acid ligase activity ; GO:0016881

PMID:30867596 Rhea:59896

Term: 4-chloro-allylglycine synthase activity

Definition: Catalysis of the reaction: 4-chloro-L-lysine + AH2 + O2 = A + formaldehyde + H2O + L-2-amino-4-chloropent-4-enoate + NH4(+).

Category: Function

child of: oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen ; GO:0016705

Xref PMID:30867596 Rhea:59888

Term: L-lysine 4-chlorinase activity

Definition: Catalysis of the reaction: 2-oxoglutarate + chloride + H(+) + L-lysine + O2 = 4-chloro-L-lysine + CO2 + H2O + succinate.

Category: Function

child of: oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, with 2-oxoglutarate as one donor, and the other dehydrogenated ; GO:0050498

Xref PMID:30867596 Rhea:59884

Term: L-gamma-glutamyl-L-propargylglycine hydroxylase activity

Definition: Catalysis of the reaction: 2-oxoglutarate + L-gamma-glutamyl-L-propargylglycine + O2 = CO2 + L-gamma-glutamyl-(3R)-L-beta-ethynylserine + succinate.

Category: Function

child of: 2-oxoglutarate-dependent dioxygenase activity ; GO:0016706

Xref PMID:30867596 Rhea:59900

ukemi commented 5 years ago

Looks like this might require some new ChEBI terms, so handing off to @hdrabkin.

sylvainpoux commented 5 years ago

Hi David,

Rhea reactions are public and all ChEBI accessions are described in the reactions:

https://www.rhea-db.org/reaction?id=59884 https://www.rhea-db.org/reaction?id=59888 https://www.rhea-db.org/reaction?id=59896 https://www.rhea-db.org/reaction?id=59892

Thanks

Sylvain

ukemi commented 5 years ago

Thanks Sylvain. I'll try to get these in today.

ukemi commented 5 years ago

I don't see the final L-beta-ethynylserine (compound 7,figure 2, PMID:30867596), but will add the term so you can annotate.

ukemi commented 5 years ago

Adding terms in #17788.

ukemi commented 5 years ago

@hdrabkin can you investigate adding L-beta-ethynylserine to ChEBI? You might get to draw a structure. Once that is in, can you add it and L-propargylglycine (CHEBI:43797) to the import files and create logical defs for L-beta-ethynylserine biosynthetic process and L-propargylglycine biosynthetic process after we are able to rebuild the ChEBI import files?

ukemi commented 5 years ago

Also note that I did not add the antibiotic parent you requested. We will only add these if the role is universal/rigid. If you think this to be the case and these compound ever only have the role of an antibiotic, I will add them. pinging @krchristie

hdrabkin commented 5 years ago

CHEBI:144729, L-beta-ethynylserine, has been submitted to CHEBI

krchristie commented 5 years ago

Looking at the two papers that @sylvainpoux cited, the 1986 paper discovering beta-ethynylserine calls it an antibiotic:

from Sanada M, et al. 1986. beta-Ethynylserine, an antimetabolite of L-threonine, from Streptomyces cattleya. J Antibiot (Tokyo). 39(2):304-5. PMID:3082841.

This antibiotic was also an antimetabolite of L-threonine and was identified as beta-ethynylserine from its physicochemical properties.

However, the 2019 paper describing discovery of the pathway refers to antibiotics only in the context of selecting for E. coli that have the appropriate plasmids in the process of putting this biosynthetic pathway into E. coli.

Marchand JA, et al. 2019. Discovery of a pathway for terminal-alkyne amino acid biosynthesis. Nature. 567(7748):420-424. PMID:30867596.

To make sure I don't forget about this, I'm going to add this to the ChEBI roles project so that I remember to take a look at this once we've come up with a plan for how we want to handle these types of roles that may be context/organism specific, but not universal.

krchristie commented 5 years ago

@hdrabkin - Once you're done with your part of this ticket, please leave it open and reassign it to me.

ukemi commented 5 years ago

It's a chicken and egg situation. @hdrabkin can't finish his part until we can update the ChEBI import, which currently has lots of inferences we don't want based on roles. I think the general roles need to be addressed first.

hdrabkin commented 5 years ago

@krchristie will do. @ukemi , did you just add the process term only?

ukemi commented 5 years ago

No. I added them all.

krchristie commented 5 years ago

I've added myself to the assignment list then, and left @hdrabkin. He can unassign himself if/when he feels that his part is complete.

hdrabkin commented 4 years ago

CHEBI:144729 is now public. Not clear if I can do a CHEBI import

balhoff commented 4 years ago

@hdrabkin you can just add it to https://github.com/geneontology/go-ontology/blob/master/src/ontology/imports/chebi_terms.txt and the import will be regenerated soon.

hdrabkin commented 4 years ago

@ukemi told me we use a special pre-roles copy to merge new stuff into, so I'll add to the txt file

krchristie commented 4 years ago

I don't see equivalence axioms on any of the pre-existing related MF terms so I didn't try to add equivalence axioms to any of the new MF terms.

Adding equivalence axioms to the BP terms utilizing the newly created ChEBI terms for

Term: L-beta-ethynylserine biosynthetic process child of: GO:0043453 alkyne biosynthetic process

Term: L-propargylglycine biosynthetic process child of: GO:0043453 alkyne biosynthetic process

For now, I left the alkyne biosynthetic process parentage directly asserted because the reasoner does not generate this.

Looking at ChEBI, they have not asserted that either L-beta-ethynylserine or L-propargylglycine is an alkyne. In ChEBI, these are both under the parentage terminal acetylenic compound while the ChEBI term alkyne is under the term acetylenes, which is a sibling term of terminal acetylenic compound. Here's the structure in ChEBI since it's easier to see it than decipher it from verbiage.

-- acetylenic compound ---acetylenes ---- acyclic acetylene ----- alkyne ----terminal acetylenic compound ----- L-beta-ethynylserine ----- L-propargylglycine

Anyway, we might want to reconsider if the placement of the new biosynthetic terms for these two compounds belong under the GO term for alkyne biosynthetic process if ChEBI doesn't place them under alkyne. Looking at ChEBI, it looks like it might be better to create a more general term for acetylenic compound biosynthesis, which would be a parent term above alkyne metabolic process.

thoughts? @hdrabkin @sylvainpoux

hdrabkin commented 4 years ago

Are all of these RHEAs public?

hdrabkin commented 4 years ago

Never mind just checked; yes they are so can use immediately.

krchristie commented 4 years ago

While the source paper does talk about production of a terminal alkyne:

Marchand JA, Neugebauer ME, Ing MC, Lin CI, Pelton JG, Chang MCY. Discovery of a pathway for terminal-alkyne amino acid biosynthesis. Nature. 2019;567(7748):420-424. doi:10.1038/s41586-019-1020-y PMID: 30867596

Here we report the discovery and characterization of a unique pathway to produce a terminal alkyne-containing amino acid in the bacterium Streptomyces cattleya.

ChEBI does not use the phrase terminal alkyne. In ChEBI, both L-beta-ethynylserine & L-propargylglycine classify under terminal acetylenic compound. If I create a term in GO for cellular terminal acetylenic compound biosynthetic process and remove direct SubClass assertions, I get this classification where neither L-beta-ethynylserine & L-propargylglycine classify under alkyne biosynthetic process

beta-ethynylserineBS-classification

It seems best to go with ChEBI's classification, so I'm going to proceed with this.

I also seem to recall that we should be using the species prevalent at pH 7.3, which would be the zwitterions for both L-beta-ethynylserine & L-propargylglycine. Let me know if I've misunderstood this, or if it doesn't apply to a pathway only known to occur in bacterium as opposed to vertebrates.

@cmungall @pgaudet - let me know if you think is OK or not.

krchristie commented 4 years ago

I was going to just do this, but the build has failed for a non-content reason, so I'm going to tag this to talk about it on Monday's call if I haven't figured out how to restart the checks by then.

krchristie commented 4 years ago

7/27/2020 - Discussion at Ontology Editors cal

However, in the call, you guys reinforced my initial thought that alkyne is "anything with a triple bond", and I checked some more stuff. I have some questions about what ChEBI is doing, so I am submitting a ticket to them.

Then, in rechecking my branch, it isn't working like I thought it was, and I also checked the classification of L-beta-ethynylserine versus L-beta-ethynylserine zwitterion on ChEBI's website. Basically, I think we may want to discuss the implications switching to the 7.3 terms may have for automatic classification of compounds since it doesn't look to me like the zwitterions of these two compounds classify chemically where the non-zwitterion terms do. @balhoff @cmungall

Here are pics of the graphs from ChEBI. The compound of interest is the very bottom one in each graph. Automatic classification using the zwitterion terms looks like it will classify terms to organic compound, but not to any kind of acetylenic compound (what ChEBI is using for anything containing a carbon carbon triple bond).

Parentage of L-beta-ethynylserine L-beta-ethynylserine-tree

Parentage of L-beta-ethynylserine zwitterion L-beta-ethynylserine--zwitterion-tree

alanbridge commented 4 years ago

Hi Karen

the problem of inconsistent hierarchies in ChEBI between charge states is a common one.

But could you incorporate the ChEBI charge state mapping that Rhea provides into either

i) the GO or ii) the reasoning process

to deal with this?

This file - chebi_pH7_3_mapping.tsv - at https://www.rhea-db.org/download maps "other" charge states to that used in Rhea.

So for

CHEBI:144729 - L-β-ethynylserine zwitterion (used in Rhea) CHEBI:144833 - L-β-ethynylserine (the one with a better developed hierarchy)

the file gives something like this:

CHEBI   CHEBI_PH7_3 ORIGIN
144833  144729  computation

i.e. a computed relation between the two.

For reasoning you could infer relations using the hierarchy of both, but this could have some issues (like where one of the parents is also a specific charge form too). Perhaps another alternative could be to add all charge states from ChEBI as xrefs in the GO? So you would add ChEBI:144833 and ChEBI:144729 as xrefs for this pathway. Again for this one could use the chebi_pH7_3_mapping.tsv to help; if one GO xref has been curated for a GO BP, then additional GO xrefs could be added automatically from such a file.

Pinging @amorgat

All the best, Alan

amorgat commented 4 years ago

Hi Karen, all I share with you our working document on how to index ChEBI "is a" relationships. https://docs.google.com/presentation/d/1xLa1Z3EVFGuncWN4BUllymY6rVdpfYJbtEumyNABhkU/edit#slide=id.p

Do not hesitate to contact us if something is not clear or if you need additional information. All the best, Anne

krchristie commented 4 years ago

Thanks @alanbridge and @amorgat - that's really helpful!!

krchristie commented 4 years ago

For now, I'm going to commit a version

I will also open a ticket about alkyne biosynthetic process since GO is using a general definition of alkyne but referencing the ChEBI term for alkyne which is restricted to simple unsubstituted straight chain alkynes.

krchristie commented 4 years ago

Issue to discuss: Using some 7.3 terms affects automatic classification of compounds

I also checked the classification of L-beta-ethynylserine versus L-beta-ethynylserine zwitterion on ChEBI's website. Basically, I think we may want to discuss the implications switching to the 7.3 terms may have for automatic classification of compounds within GO since it doesn't look to me like the zwitterions of these two compounds classify chemically where the non-zwitterion terms do.

deustp01 commented 4 years ago

Two things about thse ChEBI entries confuse me. First, ChEBI relates (incoming / outgoing) these two compounds by saying that each is the tautomer of the other, but does not have an incoming / outgoing conjugate acid - conjugate base relationship. I would have expected the opposite - conjugate acid / base but not tautomer. What am I missing in the chemistry here? And related to this, Karen's question: why should the zwitterion form lack the non-proteinogenic amino acid parent that the non-zwitterion form has? (Looking a bit more, I see the same difference for L-leucine: it has all sorts of chemical parents that its zwitterion lacks. @hdrabkin ? Second, I've noticed that ChEBI sometimes does not identify the form of an ionizable compound that is predominant at pH 7.3, so an issue may be whether GO ontology developers and curators may sometimes need to figure this our for an individual compound and if so, can that figuring be confirmed by Rhea and passed on to ChEBI so they can fill in the missing bit in their annotation for the chemical.

amorgat commented 4 years ago

Hi Peter, let me try to answer to your questions.

Concerning the case of L-β-ethynylserine

CHEBI:144729 L-β-ethynylserine zwitterion InChiKey: RBWXRFBKVDBXEG-DMTCNVIQSA-N

CHEBI:144833 L-β-ethynylserine InChiKey: RBWXRFBKVDBXEG-DMTCNVIQSA-N

You will notice that both ChEBI entries have the same InChiKey. The 2 structures are tautomers as they only differ by the position of one proton (the proton of the carboxylic acid is on the amino group in the zwitterionic form). CHEBI:144729, as a zwitterion, is both a conjugated acid and a conjugated base of CHEBI:144833, so it makes no sense using "is conjugate acid/base of" relationships, that's why ChEBI curators use the "is tautomer of" relationship instead.

As discussed during the last ChEBI workshop @EBI in May 2019, Adnan (the ChEBI curator) informed us that their guideline is to annotate most of the relationships on the fully hydrogenated form only. ChEBI entries describing the different protonation states of a molecule are linked by "is conjugate base of", "is conjugate acid of" and "is tautomer of" relationships. The reality is that some relationships may be missing and retrieving all species maybe tricky when they have several ionizable groups (need of a recursive procedure). That's why instead of these 3 relationships, the SIB developers use an additional relationship "has_major_microspecies_at_pH_7_3" which is available in the chebi.owl file (published by Rhea) but not used/displayed in the ChEBI public web site. This relationship is computed using the Marvin plugin provided by Chemaxon (https://docs.chemaxon.com/display/docs/Major+Microspecies+Plugin)

For your second point: "Second, I've noticed that ChEBI sometimes does not identify the form of an ionizable compound that is predominant at pH 7.3, so an issue may be whether GO ontology developers and curators may sometimes need to figure this our for an individual compound and if so, can that figuring be confirmed by Rhea and passed on to ChEBI so they can fill in the missing bit in their annotation for the chemical" It's true that sometimes the major species at pH 7.3 is missing. CHEBI is aware of the issue and one of the outcomes of the ChEBI workshop was Project 1: Unify alternative forms of chemical entities with collaborative effort. At the moment, we (SIB curators) submit the missing entries when needed for Rhea or UniProt. I'm not sure I understand in which cases you would need a ChEBI compound that is not used in Rhea? Is it in the definition of GO:BP term (biosynthesis of XX / degradation of YY) where the reactions are not yet clearly defined?

krchristie commented 4 years ago

@balhoff - I was thinking I would split off the 7.3 vs full protonated parentage issue issue into a separate ticket since it goes beyond the scope of the original ticket that only asked for creation of a couple terms. Is there an existing ticket where it would be good to add these 7.3 related issues, or should I just create a new ticket?

deustp01 commented 4 years ago

I'm not sure I understand in which cases you would need a ChEBI compound that is not used in Rhea? Is it in the definition of GO:BP term (biosynthesis of XX / degradation of YY) where the reactions are not yet clearly defined?

I wasn't clear - I don't see any cases here where this is needed. For the rest, I was missing some history as well as a good understanding of the acid-base / tautomer relationship. Apologies for going over old ground.

krchristie commented 4 years ago

Summary of Discussion - Ontology editors call 8/17/2020

balhoff commented 4 years ago

the SIB developers use an additional relationship "has_major_microspecies_at_pH_7_3" which is available in the chebi.owl file (published by Rhea) but not used/displayed in the ChEBI public web site.

@amorgat are those relations in that chebi file identical to the information provided in chebi_pH7_3_mapping.tsv from the Rhea download page?

amorgat commented 4 years ago

@balhoff, chebi_pH7_3_mapping.tsv tells you if a ChEBI entity is the major species at pH 7.3 (CHEBI == CHEBI_PH7_3) or what is the corresponding CHEBI_PH7_3 if it's not the case`

CHEBI CHEBI_PH7_3 ORIGIN 3 3 computation ...
15361 15361 computation 32816 15361 computation ...

In the chebi.owl, the relationships "has_major_microspecies_at_pH_7_3" are only provided between a ChEBI entry and its major species at pH 7.3.

In the previous example, only CHEBI_32816 has this relationship whereas CHEBI_3 and CHEBI_15361 haven't any "has_major_microspecies_at_pH_7_3" relationships.

To be discussed with Jerven Bolleman if you would need it.

balhoff commented 4 years ago

@amorgat thank you, I think I understand. From my perspective I think I have everything I need in the TSV file, i.e., there isn't additional information in the OWL file.

balhoff commented 3 years ago

@krchristie in the branch for #20093 (bio-chebi based on pH 7.3 mappings), these are inferable: