geneontology / go-ontology

Source ontology files for the Gene Ontology
http://geneontology.org/page/download-ontology
Creative Commons Attribution 4.0 International
220 stars 40 forks source link

resolve Rhea mapping for 'glycerol-3-phosphate dehydrogenase [NAD+] activity' GO:0004367, 'glycerol-3-phosphate dehydrogenase [NAD(P)+] activity' GO:0047952, 'glycerol-3-phosphate dehydrogenase [NADP+] activity' GO:0106257 #21790

Closed balhoff closed 8 months ago

balhoff commented 3 years ago

hdrabkin commented 2 years ago

merging NADP (GO:0106257) and NAD (GO:0004367) terms into NAD(P) term (GO:0047952) and making substrate-specific RHEAs narrowMatch xrefs should solve this.

hdrabkin commented 2 years ago

Merged NADP (GO:0106257) into NAD(P) term (GO:0047952) 4-18-22

pgaudet commented 2 years ago

I think we're only missing the other merge

hdrabkin commented 2 years ago

Ah you mean merge the NAD (GO:0004367) term; forgot about that.

hdrabkin commented 2 years ago

done; close

sjm41 commented 8 months ago

I just stumbled across this ticket in trying to resolve a GO/EC/RHEA/Metacyc mapping problem. I don't think the three-way merge that was done here was the right decision.

We've ended up with this single generic GO term: id: GO:0047952 name: glycerol-3-phosphate dehydrogenase [NAD(P)+] activity alt_id: GO:0004367 alt_id: GO:0036439 alt_id: GO:0106257 xref: EC:1.1.1.8 xref: EC:1.1.1.94 xref: KEGG_REACTION:R00842 xref: KEGG_REACTION:R00844 xref: MetaCyc:1.1.1.8-RXN xref: MetaCyc:GLYC3PDEHYDROGBIOSYN-RXN xref: RHEA:11092 {source="skos:narrowMatch"} xref: RHEA:11096 {source="skos:narrowMatch"}

Having a single GO term for 2/3 reactions and thus having a double set of xrefs is confusing and causing downstream mapping issues (at least for FlyBase). UniProt has very different sets of proteins annotated to the two ECs/RHEAs mentioned here, also showing that it's not appropriate to bundle these into a single GO term.

So, I would like to return to the previous 3-term solution (which I previously worked on and fixed with Harold in #19191, Mar 2020), that is:

id: GO:0047952 name: glycerol-3-phosphate dehydrogenase [NAD(P)+] activity def: "Catalysis of the reaction: sn-glycerol 3-phosphate + NAD(P)+ = glycerone phosphate + NAD(P)H + H+." [EC:1.1.1.94] xref: EC:1.1.1.94 xref: MetaCyc:GLYC3PDEHYDROGBIOSYN-RXN

id: NTR (= old GO:0004367) name: glycerol-3-phosphate dehydrogenase [NAD+] activity def: "Catalysis of the reaction: NAD(+) + sn-glycerol 3-phosphate = dihydroxyacetone phosphate + H(+) + NADH." [RHEA:11092] [EC:1.1.1.8] xref: EC:1.1.1.8 xref: KEGG_REACTION:R00842 xref: MetaCyc:1.1.1.8-RXN xref: RHEA:11092 is_a: GO:0047952 ! glycerol-3-phosphate dehydrogenase [NAD(P)+] activity

id: NTR (= old GO:0106257 & GO:0036439) name: glycerol-3-phosphate dehydrogenase [NADP+] activity def: "Catalysis of the reaction: NADP(+) + sn-glycerol 3-phosphate = dihydroxyacetone phosphate + H(+) + NADPH." [RHEA:11096] xref: KEGG_REACTION:R00844 xref: RHEA:11096 is_a: GO:0047952 ! glycerol-3-phosphate dehydrogenase [NAD(P)+] activity

This version separates out the NADP and NAD activities, and thus allocates the xrefs accurately.

I understand from Jim's original post on this ticket that there was a problem with the RHEA mapping with the 3-term solution - I'm not sure if re-instating the 3-term solution will set off alarm bells again?

But this may be unavoidable. The reason this issue is tricky is because EC/RHEA/MetaCyc/KEGG handle these 3 possible reactions in 3 different ways:

cmungall commented 8 months ago

Your proposed 3 term triangle solution is completely logically fine. It doesn’t mess up the xrefs like the original one did so boomer should be fine with it

The biological justification is strong as well.

On Fri, Jan 26, 2024 at 6:14 AM Steven Marygold @.***> wrote:

I just stumbled across this ticket in trying to resolve a GO/EC/RHEA/Metacyc mapping problem. I don't think the three-way merge that was done here was the right decision.

We've ended up with this single generic GO term: id: GO:0047952 name: glycerol-3-phosphate dehydrogenase [NAD(P)+] activity alt_id: GO:0004367 alt_id: GO:0036439 alt_id: GO:0106257 xref: EC:1.1.1.8 xref: EC:1.1.1.94 xref: KEGG_REACTION:R00842 xref: KEGG_REACTION:R00844 xref: MetaCyc:1.1.1.8-RXN xref: MetaCyc:GLYC3PDEHYDROGBIOSYN-RXN xref: RHEA:11092 {source="skos:narrowMatch"} xref: RHEA:11096 {source="skos:narrowMatch"}

Having a single GO term for 2/3 reactions and thus having a double set of xrefs is confusing and causing downstream mapping issues (at least for FlyBase). UniProt has very different sets of proteins annotated to the two ECs/RHEAs mentioned here, also showing that it's not appropriate to bundle these into a single GO term.

So, I would like to return to the previous 3-term solution (which I previously worked on and fixed with Harold in #19191 https://github.com/geneontology/go-ontology/issues/19191, Mar 2020), that is:

id: GO:0047952 name: glycerol-3-phosphate dehydrogenase [NAD(P)+] activity def: "Catalysis of the reaction: sn-glycerol 3-phosphate + NAD(P)+ = glycerone phosphate + NAD(P)H + H+." [EC:1.1.1.94] xref: EC:1.1.1.94 xref: MetaCyc:GLYC3PDEHYDROGBIOSYN-RXN

id: NTR (= old GO:0004367) name: glycerol-3-phosphate dehydrogenase [NAD+] activity def: "Catalysis of the reaction: NAD(+) + sn-glycerol 3-phosphate = dihydroxyacetone phosphate + H(+) + NADH." [RHEA:11092] [EC:1.1.1.8] xref: EC:1.1.1.8 xref: KEGG_REACTION:R00842 xref: MetaCyc:1.1.1.8-RXN xref: RHEA:11092 is_a: GO:0047952 ! glycerol-3-phosphate dehydrogenase [NAD(P)+] activity

id: NTR (= old GO:0106257 & GO:0036439) name: glycerol-3-phosphate dehydrogenase [NADP+] activity def: "Catalysis of the reaction: NADP(+) + sn-glycerol 3-phosphate = dihydroxyacetone phosphate + H(+) + NADPH." [RHEA:11096] xref: KEGG_REACTION:R00844 xref: RHEA:11096 is_a: GO:0047952 ! glycerol-3-phosphate dehydrogenase [NAD(P)+] activity

This version separates out the NADP and NAD activities, and thus allocates the xrefs accurately.

I understand from Jim's original post on this ticket that there was a problem with the RHEA mapping with the 3-term solution - I'm not sure if re-instating the 3-term solution will set off alarm bells again?

But this may be unavoidable. The reason this issue is tricky is because EC/RHEA/MetaCyc/KEGG handle these 3 possible reactions in 3 different ways:

  • EC has 2 terms: one generic for NAD(P) and one specific for NAD
  • RHEA has only 2 terms: one specific for NADP and one specific for NAD
  • MetaCyc follows EC, and has one entry generic for NAD(P) and one specific for NAD
  • KEGG follows RHEA, and has one entry specific for NADP and one specific for NAD So, the xrefs are not a simple 1:1 correspondence....

— Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/21790#issuecomment-1912135241, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMONQCPCX4W5IP7A4XTLYQO25FAVCNFSM476W2A7KU5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCOJRGIYTGNJSGQYQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

sjm41 commented 8 months ago

Hi @pgaudet Can you help with this one, or should we ask Raymond?

cmungall commented 8 months ago

I suppose there is a slight issue in that upstream groups will have merged any experimental annotations from the more granular terms. Even though formally we don't require people to annotate to the most granular term, it is a bit odd to restore the granularity and not have annotations at that level.

However, in the specific case of enzyme annotation, I think we should be OK as we will get the granular annotations from Rhea/Uniprot/EC pipelines anyway. So I am good with the restoration.

pgaudet commented 8 months ago

I can have a look, if it's similar to https://github.com/geneontology/go-ontology/issues/25699 the NAD/NADP issue

@cmungall I understand that we dont want to go back and forth - hopefully if we can apply our guidelines more consistently

https://wiki.geneontology.org/Guidelines_for_new_Molecular_Functions#NAD/NADP_cofactors

pgaudet commented 8 months ago

After reviewing annotations, it seems like bacteria use NADP+ and eukaryotes use NAD, see https://github.com/geneontology/go-annotation/issues/4964

Actions:

sjm41 commented 8 months ago

After reviewing annotations, it seems like bacteria use NADP+ and eukaryotes use NAD,

I think that's right, which makes me wonder why EC:1.1.1.94 was created for "NAD(P)", which would suggest that some enzymes use both/either? If GO:0047952 gets obsoleted, will EC:1.1.1.94 get added as a broad xref on GO:0004367 and GO:0106257?

pgaudet commented 8 months ago

If GO:0047952 gets obsoleted, will EC:1.1.1.94 get added as a broad xref on GO:0004367 and GO:0106257?

Yes good point

which makes me wonder why EC:1.1.1.94 was created for "NAD(P)", which would suggest that some enzymes use both/either

I asked Elisabeth Coudert from RHEA to look into that. Right now for all annotated genes it seems like it's either/or. I am guessing that EC has the NAD(P) term because the practice used to be not to try to distinguish this (I think??), and it seems they have also created NAD+/NADP+-specific terms in some cases, but not very consistently. (a bit like us :( )

deustp01 commented 8 months ago

I asked Elisabeth Coudert from RHEA to look into that. Right now for all annotated genes it seems like it's either/or. I am guessing that EC has the NAD(P) term because the practice used to be not to try to distinguish this (I think??), and it seems they have also created NAD+/NADP+-specific terms in some cases, but not very consistently. (a bit like us :( )

An important textbook biochemistry / biology point is that mostly NADH participates in oxidations and reductions involved in energy generation and use (e.g., beta-oxidation of long-chain fatty acids in the mitochondria generates large amounts of NADH that is fed into ATP generation), while NADP is involved in biosyntheses and other interconversions. Almost all human enzymatic reactions that involve NAD(P) are specific for one or the other. A question for GO, however, is whether this really important distinction should be made at the level of molecular function / single-step molecular reactions or at the level of biological processes / pathways and annotations of participating small molecules. @ukemi @sjm41 ?

sjm41 commented 8 months ago

I am guessing that EC has the NAD(P) term because the practice used to be not to try to distinguish this (I think??), and it seems they have also created NAD+/NADP+-specific terms in some cases, but not very consistently. (a bit like us :( )

I wonder whether EC:1.1.1.94 should be changed to be specifically for NADP (the 'prokaryotic reaction'), if that's what it really means and is how it's been used in annotation?

Here are the counts you get by searching Swiss-Prot for the ECs / RHEAs mentioned above:

NADP = RHEA:11096 (no corresponding EC) => 593 RHEA annotations in Swiss-Prot, all to bacteria/archaea (seems correct)

NAD(P) = EC:1.1.1.94 (no corresponding RHEA) => 593 EC annotations in Swiss-Prot, all to bacteria/archaea (i.e. same as above for NADP)

NAD = EC:1.1.1.8 / RHEA:11092 => 53 EC annotations in Swiss-Prot, all to eukaryotes (seems correct) => 61 RHEA annotations in Swiss-Prot are to eukaryotes (seems correct) => 605 RHEA annotations in Swiss-Prot are to bacteria/archaea (seems wrong, caused by EC:1.1.1.94 being mapped to both RHEA:11096 and RHEA:11092?)

A specific example of how the current EC-RHEA mappings are causing problems in UniProt - this bacterial enzyme (https://www.uniprot.org/uniprotkb/P0A6S7/entry) is annotated with EC:1.1.1.94 and as a result is getting both RHEA:11096 and RHEA:11092 annotations via Unirule, which I don't think is right.

@amorgat ?

pgaudet commented 8 months ago

Restored terms:

[Term] id: GO:0141152 name: glycerol-3-phosphate dehydrogenase (NAD+) activity namespace: molecular_function def: "Catalysis of the reaction: NAD+ + sn-glycerol 3-phosphate = dihydroxyacetone phosphate + H+ + NADH." [RHEA:11092] synonym: "alpha-glycerol phosphate dehydrogenase (NAD) activity" EXACT [EC:1.1.1.8] synonym: "alpha-glycerophosphate dehydrogenase (NAD) activity" EXACT [EC:1.1.1.8] synonym: "glycerol 1-phosphate dehydrogenase activity" BROAD [EC:1.1.1.8] synonym: "glycerol phosphate dehydrogenase (NAD) activity" EXACT [EC:1.1.1.8] synonym: "glycerol-3-phosphate dehydrogenase (NAD) activity" EXACT [EC:1.1.1.8] synonym: "glycerol-3-phosphate dehydrogenase [NAD+] activity" EXACT [] synonym: "glycerophosphate dehydrogenase (NAD) activity" EXACT [EC:1.1.1.8] synonym: "hydroglycerophosphate dehydrogenase activity" BROAD [EC:1.1.1.8] synonym: "L-alpha-glycerol phosphate dehydrogenase activity" BROAD [EC:1.1.1.8] synonym: "L-alpha-glycerophosphate dehydrogenase activity" BROAD [EC:1.1.1.8] synonym: "L-glycerol phosphate dehydrogenase activity" BROAD [EC:1.1.1.8] synonym: "L-glycerophosphate dehydrogenase activity" BROAD [EC:1.1.1.8] synonym: "NAD-alpha-glycerophosphate dehydrogenase activity" EXACT [EC:1.1.1.8] synonym: "NAD-dependent glycerol phosphate dehydrogenase activity" EXACT [EC:1.1.1.8] synonym: "NAD-dependent glycerol-3-phosphate dehydrogenase activity" EXACT [EC:1.1.1.8] synonym: "NAD-L-glycerol-3-phosphate dehydrogenase activity" EXACT [EC:1.1.1.8] synonym: "NAD-linked glycerol 3-phosphate dehydrogenase activity" EXACT [EC:1.1.1.8] synonym: "NADH-dihydroxyacetone phosphate reductase activity" EXACT [EC:1.1.1.8] xref: EC:1.1.1.8 xref: KEGG_REACTION:R00842 xref: MetaCyc:1.1.1.8-RXN xref: MetaCyc:GLYC3PDEHYDROGBIOSYN-RXN xref: RHEA:11092 {source="skos:exactMatch"} is_a: GO:0016616 ! oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor property_value: term_tracker_item "https://github.com/geneontology/go-ontology/issues/21790" xsd:anyURI created_by: pg creation_date: 2024-01-30T10:47:46Z

+[Term] +id: GO:0141153 +name: glycerol-3-phosphate dehydrogenase (NADP+) activity +namespace: molecular_function +def: "NADP+ + sn-glycerol 3-phosphate = dihydroxyacetone phosphate + H+ + NADPH." [RHEA:11096] +synonym: "glycerol phosphate dehydrogenase (nicotinamide adenine dinucleotide (phosphate)) activity" EXACT [EC:1.1.1.94] +synonym: "glycerol-3-phosphate dehydrogenase (NAD(P)+) activity" BROAD [] +synonym: "glycerol-3-phosphate dehydrogenase [NADP+] activity" EXACT [] +synonym: "L-glycerol-3-phosphate:NAD(P) oxidoreductase activity" EXACT [EC:1.1.1.94] +synonym: "NAD-dependent glycerol phosphate dehydrogenase activity" RELATED [EC:1.1.1.8] +synonym: "sn-glycerol-3-phosphate:NAD(P)+ 2-oxidoreductase activity" EXACT [EC:1.1.1.94] +xref: EC:1.1.1.94 +xref: KEGG_REACTION:R00844 +xref: MetaCyc:GLYC3PDEHYDROGBIOSYN-RXN +xref: RHEA:11096 {source="skos:exactMatch"} +is_a: GO:0016616 ! oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor +property_value: term_tracker_item "https://github.com/geneontology/go-ontology/issues/21790" xsd:anyURI +created_by: pg +creation_date: 2024-01-30T15:48:01Z

pgaudet commented 8 months ago

@sjm41

about EC:1.1.1.94:

The EC entry states "The enzyme from Escherichia coli shows specificity for the B side of NADPH. " - as if this was meant for NADPH, not NAD(P)H - @kaxelsen , should this be fixed?

pgaudet commented 8 months ago

@sjm41 AFAIK what is left is questions for EC and RHEA; I'll let you decide when you want to close this?

sjm41 commented 8 months ago

Thanks for working on this Pascale!

kaxelsen commented 8 months ago

about EC:1.1.1.94:

The EC entry states "The enzyme from Escherichia coli shows specificity for the B side of NADPH. " - as if this was meant for NADPH, not NAD(P)H - @kaxelsen , should this be fixed?

There is nothing to fix. The comment is about from which side the NADPH is attacked, in this case the B side. So it is a detail that was studied with NADPH, and probably not NAD.

I asked Elisabeth Coudert from RHEA to look into that. Right now for all annotated genes it seems like it's either/or. I am guessing that EC has the NAD(P) term because the practice used to be not to try to distinguish this (I think??), and it seems they have also created NAD+/NADP+-specific terms in some cases, but not very consistently. (a bit like us :( )

This is an insult to the EC system. When an entry is created with the term NAD(P)H it is because the enzyme accepts both NADH and NADPH with considerable activity. So when there are three entries (NADH, NADPH and NAD(P)H) it is because there are three different types of enzymes, one that only accepts NADH, one that only accepts NADPH, and one type that accepts both!

If you look at the abstract of EC 1.1.1.94 ref. 4 (pmid:6767719) you will read: "NADPH, NADH, and nicotinamide hypoxanthine dinucleotide were used as substrates about equally well by both enzymes." and "The enzymes were shown to have B-type stereospecificity for NADPH"

sjm41 commented 8 months ago

Thanks for clarifying the situation @kaxelsen And thanks for pointing to that PMID - this demonstrates why I should routinely use ExplorEnz rather than Expasy, as the latter doesn't show these references!

So EC doesn't have a entry specifically for the NADPH reaction, right?

@pgaudet I think Kristian's answer means that GO should retain the general 'GO:0047952 glycerol-3-phosphate dehydrogenase [NAD(P)+] activity' parent term, and this term (and not GO:0141153 glycerol-3-phosphate dehydrogenase (NADP+) activity) should have xref: EC:1.1.1.94, right?

kaxelsen commented 8 months ago

So EC doesn't have a entry specifically for the NADPH reaction, right?

No it doesn't

pgaudet commented 8 months ago

Thanks @kaxelsen @sjm41 for clarifying all this.

I'll have another look at the literature to see of we need all 3 terms in GO.

pgaudet commented 8 months ago

Thanks to the whole RHEA team for the help.

We now have:

Thanks, Pascale

sjm41 commented 8 months ago

Thanks everyone!

@pgaudet , can I just check the following:

pgaudet commented 8 months ago

Thanks @sjm41 !!

I also noticed that glycerol-3-phosphate dehydrogenase [NAD(P)+] activity was used in logical definitions and asserted superclasses for some eukaryote BP.

Please let me know if you see more errors,

Thanks, Pascale