Closed sjm41 closed 2 months ago
I looked at the following GO terms. The links are all valid for them. Mostly, they cover pathway variants that occur in different taxa.
There is a no relevant MetaCyc "Class" term here.
There is a no relevant MetaCyc "Class" term here.
There is a MetaCyc "Class: Trehalose Biosynthesis" (ID:Trehalose-biosynthesis) => exactMatch
There is a MetaCyc "Class: Trehalose Degradation" (ID:Trehalose-Degradation) => exactMatch
There is a MetaCyc "Class: UDP-N-acetyl-D-galactosamine Biosynthesis" (ID:UDP-Nac-Galactosamine-Biosynthesis) => exactMatch
There is a MetaCyc "Class: CMP-N-acetylneuraminate Biosynthesis" (ID:CMP-N-Acetylneuraminate-Biosynthesis) => exactMatch
There is a MetaCyc "Class: UDP-N-acetyl-D-glucosamine Biosynthesis" (ID:UDP-NAc-Glucosamine-Biosynthesis) => exactMatch
There is a MetaCyc "Class: N-acetylglucosamine degradation" (ID:N-Acetylglucosamine-Degradation) => exactMatch
There is a MetaCyc "Class: Glycogen Degradation" (ID:Glycogen-Degradation) => exactMatch
The MetaCyc class here is not specific for glycogen - it is "Glycogen and Starch Biosynthesis" (ID:GLYCOGEN-BIOSYN), which doesn't have an equivalent in the GO.
See ticket #28271 Pending ticket #27467 Currently the links are OK. If the new terms are created the xref will be split.
The MetaCyc class here is "Pentose Phosphate Pathways" (ID: Pentose-Phosphate-Cycle), which is the equivalent of the GO parent "GO:0006098 pentose-phosphate shunt"
The MetaCyc class here is "Pentose Phosphate Pathways" (ID: Pentose-Phosphate-Cycle), which is the equivalent of the GO parent "GO:0006098 pentose-phosphate shunt"
The MetaCyc class here is "UDP-sugar Biosynthesis" (ID: UDP-Sugar-Biosynthesis), which has no equivalent GO term.
There is a MetaCyc "Class: mannitol degradation" (ID:Mannitol-Degradation) => exactMatch
There is a MetaCyc "Class: Chitin Degradation" (ID:Chitin-Degradation) => exactMatch
Note that the ontology has been simplified (see #27960)
There is a MetaCyc "Class: D-Mannose-Degradation" (ID:D-Mannose-Degradation) => this would be an exactMatch on the GO parent GO:0019309 mannose catabolic process
First 3 links are correct but see ticket #28494
But MetaCyc:PWY-7247 (Bacteria) should be removed? It starts with a D-glucuronide so should go normally to BP GO term which has been made obsolete id: GO:0019391 name: obsolete glucuronoside catabolic process See ticket #28506
MetaCyc class here is "D-Glucuronate Degradation" (id: D-Glucuronate-Degradation) => exactMatch It also has this additional instance: superpathway of β-D-glucuronosides degradation (GLUCUROCAT-PWY) (bacteria) => narrowMatch
Conclusions from looking at the remaining terms (see below):
BETSYN-PWY is under the class "Betaine Biosynthesis" (ID: Betaine-Biosynthesis) which doesn't seem to have an equivalent in GO. But it does have these other relevant instances: glycine betaine biosynthesis II (Gram-positive bacteria) (PWY-3722) => add as narrowMatch glycine betaine biosynthesis III (plants) (PWY1F-353) => add as narrowMatch glycine betaine biosynthesis IV (from glycine) (P541-PWY) (Archaea, Bacteria) => this is already on the sister GO term "GO:0019286 glycine betaine biosynthetic process from glycine"
CHOLINE-BETAINE-ANA-PWY & P542-PWY are under a different class "Choline Degradation" (ID:Choline-Degradation), so should be moved to "GO:0042426 choline catabolic process" along with other instances: choline degradation II (PWY-3721) => narrowMatch choline degradation III (PWY-7167) -> single step, add to NTR for EC:4.3.99.4? choline degradation IV (PWY-7494) => narrowMatch
This is the only GO term in this list that already has a xref to a MetaCyc 'class'
The METHGLYUT-PWY 'superpathway') is described as "This superpathway summarizes the different routes for methylglyoxal detoxification found in Escherichia coli K-12", and includes the following 4 subpathways. methylglyoxal degradation I (PWY-5386) methylglyoxal degradation III (PWY-5453) methylglyoxal degradation IV (PWY-5459) L-lactaldehyde degradation (aerobic) (PWY0-1317)
Instances of Methylglyoxal-Detoxification are: methylglyoxal degradation I (PWY-5386) methylglyoxal degradation II (PWY-5462) methylglyoxal degradation III (PWY-5453) methylglyoxal degradation IV (PWY-5459) methylglyoxal degradation V (PWY-5458) methylglyoxal degradation VI (MGLDLCTANA-PWY) methylglyoxal degradation VII (PWY-5456) methylglyoxal degradation VIII (PWY-5386-1) methylglyoxal degradation IX (PWY-8458) methylglyoxal degradation X (PWY8J2-23)
GO:0051596 has these children: --methylglyoxal catabolic process to lactate (GO:0061727) - no MetaCyc xrefs ----methylglyoxal catabolic process to D-lactate via S-lactoyl-glutathione (GO:0019243) - MetaCyc:PWY-5386 xref ----D-lactate biosynthetic process from methylglyoxal via (R)-lactaldehyde (GO:0019248) - MetaCyc:MGLDLCTANA-PWY xref
So 8 of the MetaCyc pathways don't have xrefs on any of these terms. Q) should GO:0019248 be named/defined/classified wrt methylglyoxal catabolism or D-lactate synthesis??
Class here is "L-threonine Biosynthesis" (ID: THREONINE-BIOSYNTHESIS) => exactMatch It only has the two instances listed above.
HOMOSER-THRESYN-PWY covers L-homoserine to L-threonine THRESYN-PWY (the 'superpathway') is the same, plus 4 upstream reactions starting with oxaloacetate. The summary says "The overall superpathway of threonine biosynthesis as shown here covers the entire process of converting the central energy metabolism molecule oxaloacetate into L-threonine."
Given the taxonomic range on both MetaCyc pathways excludes eukaryotes, maybe both these should be typed as 'narrowMatch'??
MetaCyc class here is "Phosphopantothenate Biosynthesis" (ID: Pantothenate-Biosynthesis) => exactMatch That has additional instance: phosphopantothenate biosynthesis III (archaea) (PWY-6654) => narrowMatch
MetaCyc class here is "Reductive TCA Cycles" (ID: Reductive-TCA-Cycles) => exactMatch That class has an additional instances: reductive TCA cycle II (PWY-5392) (Aquificota) TCA cycle V (2-oxoglutarate synthase) (PWY-6969) (several)
But PWY-6969 is also an instance of "TCA cycle" (see below) so don't add here.
Class here is "TCA cycle" (ID: TCA-VARIANTS) => exactMatch It has several more instances: partial TCA cycle (obligate autotrophs) (PWY-5913) => narrowMatch TCA cycle II (plants and fungi) (PWY-5690) => narrowMatch TCA cycle III (animals) (PWY66-398) => narrowMatch TCA cycle V (2-oxoglutarate synthase) (PWY-6969) => narrowMatch TCA cycle VI (Helicobacter) (REDCITCYC) => narrowMatch TCA cycle VII (acetate-producers) (PWY-7254) => narrowMatch TCA cycle VIII (Chlamydia) (TCA-1) => narrowMatch
MetaCyc class here is "Purine Nucleotide Degradation" (ID:Purine-Degradation), which would be exactMatch on the GO parent term GO:0006145 purine nucleobase catabolic process
Not sure if PWY-5044 is "anaerobic"??
MetaCyc class here is "Gibberellin biosynthesis" (GIBBERELLINS-BIOSYNTHESIS) => exactMatch That has additional instances: gibberellin biosynthesis IV (Gibberella fujikuroi) (PWY-5047) => narrowMatch gibberellin biosynthesis V (PWY-7232) (Euphyllophyta) => narrowMatch
MetaCyc class here is "L-alanine Biosynthesis" (ID: ALANINE-SYN), which would an exactMatch on the GO parent term GO:0042852 L-alanine biosynthetic process
MetaCyc class here is "Coenzyme A Biosynthesis" (ID: CoA-Biosynthesis) => exactMatch That has some additional instances: coenzyme A biosynthesis II (eukaryotic) (PWY-7851) => narrowMatch coenzyme A biosynthesis III (archaea) (PWY-8342) => narrowMatch coenzyme A salvage (bacteria) (PWY8J2-29) => narrowMatch superpathway of coenzyme A biosynthesis III (mammals) (COA-PWY-1) => narrowMatch
MetaCyc class here is "Brassinosteroid Biosynthesis" (ID:Brassinosteroid-Biosynthesis) => exactMatch
MetaCyc class here is "Thiamine Diphosphate Salvage" (ID:Thiamin-Salvage) => exactMatch That has additional instances: base-degraded thiamine salvage (PWY-6899) (Archaea, Bacteria, Fungi) => narrowMatch hydroxymethylpyrimidine salvage (PWY-6910) (Archaea, Bacteria, Fungi, Viridiplantae) => narrowMatch thiamine diphosphate formation from pyrithiamine and oxythiamine (yeast) (PWY-7357) => narrowMatch thiamine diphosphate salvage IV (yeast) (PWY-7356) => narrowMatch thiamine diphosphate salvage V (PWY-8457) (Bacteria, Eukaryota) => narrowMatch
MetaCyc class is "L-proline Biosynthesis" (ID: PROLINE-SYN) => exactMatch That has the 3 additional instances: L-ornithine degradation I (L-proline biosynthesis) (ORN-AMINOPENTANOATE-CAT-PWY) (Bacteria) L-proline biosynthesis II (from arginine) (PWY-4981) (Archaea, Bacteria) L-proline biosynthesis IV (PWY-4281) (Viridiplantae)
But ORN-AMINOPENTANOATE-CAT-PWY is already an xref on GO:0019466 ornithine catabolic process via proline, so don't add here.
MetaCyc class here is "NAD Biosynthesis" (ID: NAD-SYN), which would be exactMatch on the GO parent term GO:0019357 nicotinate nucleotide biosynthetic process. Seems to be additional instances of "salvage" to add here: NAD salvage pathway II (PNC IV cycle) (PWY-7761) (Bacteria) => narrowMatch NAD salvage pathway III (to nicotinamide riboside) (NAD-BIOSYNTHESIS-II) (Bacteria) => narrowMatch NAD salvage pathway IV (from nicotinamide riboside) (PWY3O-4106) (Bacteria, Fungi, Metazoa) => narrowMatch NAD salvage pathway V (PNC V cycle) (PWY3O-4107) (Eukaryota) => narrowMatch nicotinate riboside salvage pathway I (PWY3O-224) (Eukaryota) => narrowMatch
MetaCyc Class for FASYN-INITIAL-PWY and PWY-4381 is "Fatty Acid Biosynthesis Initiation" (ID:Fatty-Acid-Biosyn-Initiation), which doesn't have an equivalent in GO. The grand-parent class for those (and the direct class for PWY-5156) is "Fatty Acid Biosynthesis" (ID:Fatty-acid-biosynthesis), which would be an exactMatch for GO:0006633. Looks like additional instances of "Fatty Acid Biosynthesis" could be added as xrefs to GO:0006633 or its children - needs more focussed review...
Looks like this GO term should be obsoleted - #28786
MetaCyc class here is "Formaldehyde Oxidation" (ID:Formaldehyde-Oxidation), which doesn't fit with GO organisation.
There is a MetaCyc "Class: Folate Transformations" (ID:Folate-Transformations) => this doesn't really have an equivalent in GO
There is a MetaCyc "Class: L-arginine Biosynthesis" (ID:ARGININE-SYN) => exactMatch That has the additional instances: L-arginine biosynthesis III (via N-acetyl-L-citrulline) (PWY-5154) (Bacteria) => narrowMatch L-arginine biosynthesis IV (archaea) (PWY-7400) => narrowMatch
There is a MetaCyc "Class: Geranylgeranyl Diphosphate Biosynthesis" (ID:GGPP-Biosynthesis) => exactMatch That has the additional instance: superpathway of geranylgeranyldiphosphate biosynthesis I (via mevalonate) (PWY-5910) (Bacteria, Eukaryota) => narrowMatch
MetaCyc:PWY-762: There is a MetaCyc "Class: Phospholipid Biosynthesis"(ID:Phospholipid-Biosynthesis), but that would be an exactMatch on GO:0008654 phospholipid biosynthetic process
MetaCyc:PWY-782: There is a MetaCyc "Class: Glycolipid Biosynthesis" (ID:Glycolipids-Biosynthesis), but that would be an exactMatch on GO:0009247 glycolipid biosynthetic process
=> obsolete this GO term because it's a MF? #28800
There is a MetaCyc "Class: L-glutamate Degradation" (ID:GLUTAMATE-DEG), but that would be an exact match on GO parent term GO:0006538 glutamate catabolic process
There is a MetaCyc "Class: Ethanol Degradation" (ID:Ethanol-Degradation) => exactMatch
There is a MetaCyc "Class: Protein Glycosylation" (ID:Protein-Glycosylation) but that would be an exactMatch on the GO parent term GO:0006486 protein glycosylation
There is a MetaCyc "Class: Phospholipid Biosynthesis" (ID:Phospholipid-Biosynthesis) => exactMatch But that lists ~40 child classes/direct instances, whereas the GO term has no is_a children. Discrepancy probably due to what is classed as a subtype of phospholipid in MetaCyc vs GO/ChEBI??
There is a MetaCyc "Class: Spermidine Biosynthesis" (ID:Spermidine-Biosynthesis) => exactMatch That lists these additional instances: spermidine biosynthesis II (PWY-6559) (Bacteria) => narrowMatch spermidine biosynthesis III (PWY-6834) (Archaea, Bacteria) => narrowMatch spermidine biosynthesis IV (PWY2PN3-14) (Bacteria) => narrowMatch
There is a MetaCyc "Class: UDP-sugar Biosynthesis" (ID:UDP-Sugar-Biosynthesis) but that would be an exact match on a parent GO term, and the equivalent GO term doesn't exist anyway.
There is a MetaCyc "Class: Ammonia Assimilation" (ID:Ammonia-Assimilation) => exactMatch
That has these additional instances:
ammonia assimilation cycle I (PWY-6963) (Viridiplantae etc) => narrowMatch
ammonia assimilation cycle II (PWY-6964) (Viridiplantae etc) => narrowMatch
L-glutamine biosynthesis I (GLNSYN-PWY) (Archaea, Bacteria
There is a MetaCyc "Class: L-arginine Degradation" (ID:ARGININE-DEG) but that would be an exact match to the GO parent term GO:0006527 arginine catabolic process
There is a MetaCyc "Class: L-arginine Degradation" (ID:ARGININE-DEG) but that would be an exact match to the GO parent term GO:0006527 arginine catabolic process
I will remove the "MetaCyc:CPLX*" xrefs on these terms:
id: GO:0016612 name: molybdenum-iron nitrogenase complex def: "An enzyme complex containing a molybdenum-iron cluster found in many species. It is composed of two proteins, dinitrogenase and nitrogenase reductase; dinitrogenase, the molybdenum-iron protein, is tetrameric with an alpha2-beta2 structure, and nitrogenase reductase is a homodimer." [PMID:11566366] xref: MetaCyc:CPLX-186 xref: MetaCyc:CPLX-525
id: GO:0043853 name: methanol-CoM methyltransferase complex def: "A heterotrimeric protein complex composed of a methanol methyltransferase subunit, a corrinoid protein and a methanol-specific corrinoid:coenzyme M methyltransferase subunit. Catalyzes the transfer of a methyl group from methanol to coenzyme M as part of the pathway of methanogenesis from methanol." [PMID:9363780] xref: MetaCyc:CPLX-421
id: GO:0140690 name: dihydropyrimidine dehydrogenase (NAD+) complex def: "A heteromultimeric complex capable of dihydropyrimidine dehydrogenase (NAD+); in E. coli, composed of PreA and PreT." [PMID:21169495, PMID:34097066] xref: MetaCyc:CPLX0-7788
id: GO:0009353 name: obsolete mitochondrial oxoglutarate dehydrogenase complex def: "OBSOLETE. A mitochondrial complex of multiple copies of three enzymatic components: oxoglutarate dehydrogenase (lipoamide) (E1), dihydrolipoamide S-succinyltransferase (E2) and dihydrolipoamide dehydrogenase (E3); catalyzes the overall conversion of 2-oxoglutarate to succinyl-CoA and carbon dioxide (CO2) within the mitochondrial matrix." [GOC:mtg_sensu, MetaCyc:CPLX66-42, PMID:10848975]
@pgaudet Just to recap what Rossana and I found in this ticket. Most/all of the BP terms with multiple MetaCyc xrefs are accurate - the multiple xrefs refer to pathway variants or taxa-specific versions that exactly match the GO def. In a few cases, it may be possible/appropriate to create new, more specific child terms that describe the pathway variant (using distinctions like "biosynthesis process from X" or "catabolic process via Y" etc), but for most cases it seems we have have multiple MetaCyc xrefs that c/should be accurately tagged as exactMatch matches.
If that's the case, then we would have to allow multiple EXACT MetaCyc xrefs on BP terms (but not MF or CC terms).
Unless you see an alternative?
If the various pathways are variants and taxon-specific pathways, I would rather use narrowMatch; or is there a problem with that?
I will remove the "MetaCyc:CPLX*" xrefs on these terms:
These seem fine? Is the think that we are so incomplete it's better to have none rather than a useless handful?
Message ID: @.***>
+1 to narrowMatch
If we go with the more specific terms we need to mark the existing ones as do-not-annotate and manage the process of having all groups automatically pushdown annotations to the taxon-specific subclasses.
Note we do have a semi-implemented undocumented DP for the via-X pathways, many presumably created for alignment with metacyc/biocyc at some point in the past, but I don't think this was implemented very consistently
On Thu, Aug 22, 2024 at 7:20 AM pgaudet @.***> wrote:
If the various pathways are variants and taxon-specific pathways, I would rather use narrowMatch; or is there a problem with that?
— Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/28527#issuecomment-2304797995, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOPZHHPEYXXFTCAMQG3ZSXXSTAVCNFSM6AAAAABLA6H27SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBUG44TOOJZGU . You are receiving this because you are subscribed to this thread.Message ID: @.***>
I will remove the "MetaCyc:CPLX*" xrefs on these terms: These seem fine? Is the think that we are so incomplete it's better to have none rather than a useless handful?
Right, there were only 4 MetaCyc complex IDs x-reffed to GO-CC complex terms, which seems next to useless. I don't believe we have a policy of trying to add external xrefs to GO terms to complexes (in MetaCyc, or in ComplexPortal or Reactome), but if wanted to introduce that we should do so systematically and via files provided by them.
If the various pathways are variants and taxon-specific pathways, I would rather use narrowMatch; or is there a problem with that?
I guess it's OK. So, if we have a leaf GO-BP (metabolic pathway) term that describes a general pathway, and we have MetaCyc pathways that fit the BP description but are taxon-specific or otherwise alternative version of the pathway, then I will tag them as narrowMatch. E.g. these types of case:
id: GO:0008654 name: phospholipid biosynthetic process def: "The chemical reactions and pathways resulting in the formation of a phospholipid, a lipid containing phosphoric acid as a mono- or diester." [ISBN:0198506732] xref: MetaCyc:PHOSLIPSYN-PWY = superpathway of phospholipid biosynthesis III (E. coli) xref: MetaCyc:PHOSLIPSYN2-PWY = superpathway of phospholipid biosynthesis II (plants)
or
id: GO:0016132 name: brassinosteroid biosynthetic process namespace: biological_process def: "The chemical reactions and pathways resulting in the formation of brassinosteroids, any of a group of steroid derivatives that occur at very low concentrations in plant tissues and may have hormone-like effects." [ISBN:0192801023] xref: MetaCyc:PWY-2582 = brassinolide biosynthesis II xref: MetaCyc:PWY-699 = brassinolide biosynthesis I
Sounds good!
MetaCyc also has pathway classes that more closely correspond to GO BP classes
https://cabbi-biocyc.igb.illinois.edu/META/NEW-IMAGE?type=ECOCYC-CLASS&object=Phospholipid-Biosynthesis https://cabbi-biocyc.igb.illinois.edu/META/NEW-IMAGE?type=ECOCYC-CLASS&object=Brassinosteroid-Biosynthesis
Loose analogy:
MetaCyc pathway classes - GO BP classes MetaCyc pathways - GO-CAM templates BioCyc pathways - GO-CAM species specific models
On Thu, Aug 22, 2024 at 9:19 AM Steven Marygold @.***> wrote:
If the various pathways are variants and taxon-specific pathways, I would rather use narrowMatch; or is there a problem with that?
I guess it's OK. So, if we have a leaf GO-BP (metabolic pathway) term that describes a general pathway, and we have MetaCyc pathways that fit the BP description but are taxon-specific or otherwise alternative version of the pathway, then I will tag them as narrowMatch. E.g. these types of case:
id: GO:0008654 name: phospholipid biosynthetic process def: "The chemical reactions and pathways resulting in the formation of a phospholipid, a lipid containing phosphoric acid as a mono- or diester." [ISBN:0198506732] xref: MetaCyc:PHOSLIPSYN-PWY = superpathway of phospholipid biosynthesis III (E. coli) xref: MetaCyc:PHOSLIPSYN2-PWY = superpathway of phospholipid biosynthesis II (plants)
or
id: GO:0016132 name: brassinosteroid biosynthetic process namespace: biological_process def: "The chemical reactions and pathways resulting in the formation of brassinosteroids, any of a group of steroid derivatives that occur at very low concentrations in plant tissues and may have hormone-like effects." [ISBN:0192801023] xref: MetaCyc:PWY-2582 = brassinolide biosynthesis II xref: MetaCyc:PWY-699 = brassinolide biosynthesis I
— Reply to this email directly, view it on GitHub https://github.com/geneontology/go-ontology/issues/28527#issuecomment-2305159583, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOMGIT76ZAPN5VALLBLZSYFQHAVCNFSM6AAAAABLA6H27SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMBVGE2TSNJYGM . You are receiving this because you commented.Message ID: @.***>
Right, and MetaCyc also has "superpathways". The following example has both as current xrefs:
id: GO:0051596 name: methylglyoxal catabolic process def: "The chemical reactions and pathways resulting in the breakdown of methylglyoxal, CH3-CO-CHO, the aldehyde of pyruvic acid." [GOC:ai] xref: MetaCyc:METHGLYUT-PWY = superpathway of methylglyoxal degradation xref: MetaCyc:Methylglyoxal-Detoxification = grouping class for 10 degradation pathways
subpathways of METHGLYUT-PWY are: methylglyoxal degradation I methylglyoxal degradation III methylglyoxal degradation IV L-lactaldehyde degradation (aerobic)
sub pathways of Methylglyoxal-Detoxification are: methylglyoxal degradation I methylglyoxal degradation II methylglyoxal degradation III methylglyoxal degradation IV methylglyoxal degradation V methylglyoxal degradation VI methylglyoxal degradation VII methylglyoxal degradation VIII methylglyoxal degradation IX methylglyoxal degradation X
Hmm, in this case the the metacyc class seems to be more semantically aligned with the GO class, since would would presumably group the same I-X, and our def matches their note: "This class is a variant class, i.e. its purpose is to group together a set ofvariant pathways. Variant pathways are those that accomplish roughly the same biological function, such as degradation of a given starting material, or biosynthesis of an end product. The variant pathways may or may not share any common reactions."
Do we know why the superpathway doesn't encompass all?
I would say try and make a single exact mapping to the most precise (the class), then everything else can be inferred
Also of note in this case we have two subclasses:
The population of annotations is super lopsided christmas tree here. These subclasses don't seem to be doing us much good in terms of consistently discriminating among variants.
We could try and map individual subclasses to MetaCyc individual pathways I-X... but the granularity is not buying us much?
I made a second-pass through these ~50 GO-BP terms with multiple MetaCyc mappings, paying particular attention to the 'expected taxonomic range' associated for each pathway (as stated by MetaCyc) and the 'Class' to which each pathway is stated to belong within the MetaCyc pathway ontology. I edited the lists above to add this extra information.
Here are my observations/proposals:
Almost all of the MetaCyc pathway instances refer to taxon-specific variants, so I agree we should simply mark these as 'narrowMatch' xrefs to the given BP term. I'll go ahead and do this unless there are objections. (That will fix the original issue and allow #28146 to proceed!)
Currently, only one (GO:0051596) of these ~50 BP terms has a mapping to a MetaCyc 'Class'. So it seems mapping BPs to MetaCyc Classes is not something we've really done to date. As Chris said, these 'classes' are logically equivalent to GO-BP terms, so they could be added and tagged as 'exactMatch'. However, MetaCyc pathways don't always belong to a Class that equates to a BP term - e.g. an equivalent Class exists for 28 of the 46 non-obsolete BP terms in this list. So, I could add these 28 Classes as new exactMatch xrefs, but coverage would be patchy (and of course I haven't looked at any mappings outside the current list). So, for now, I suggest I don't add Class mappings but I'll make a separate ticket to consider the pros/cons of doing so systematically.
Some MetaCyc pathways are associated with 'superpathways', defined in MetaCyc as "a class of metabolic pathways that are constructed by combining and connecting individual pathways (which can be shown separately) to depict relationships between them. In some cases those individual pathways start from a common precursor, or produce a common product, but they can have other relationships as well. Superpathways can have individual reactions as their components in addition to other pathways." Despite this larger concept, there are several valid narrowMatch mappings to superpathways within the current list. So I suggest we just handle these the same as regular pathways, keeping them and marking them as narrowMatch or broadMatch as appropriate.
This exercise also revealed several other related issues (e.g. missing MetaCyc xrefs, incorrect MetaCyc xrefs, BP terms that are single steps and are really MFs etc) which I'll fix. These are noted above, but I'll make separate tickets for any significant issues.
From this exercise, it's possible that all current MetaCyc xrefs to BP terms (even 1:1 mappings) should be made narrowMatch. If so, that edit could be done computationally in bulk.
What about xrefs to other pathway databases? Following the above logic, it seems that all BP xrefs to Reactome pathways should be marked as 'narrowMatch' since they are specific to human pathways. In contrast, at least some mappings to KEGG_PATHWAY maps should be exactMatch since they are species/taxon agnostic, though others should be broad/narrow match where the KEGG pathway represents a broader/narrower concept than the BP term - a task for another day...
I'll also leave the question about whether or not GO wants to have some/all of the "process via X" pathway variants for another day.
Thanks @sjm41, great work, and I agree with your analysis and proposed strategy!
The following 48 BP terms (and 1 CC term) each has multiple mappings to MetaCyc IDs. Need to review to check if correct/makes sense, and (re)move any incorrect mappings. I'm not sure if it will make sense to tag these with exact/broad/narrowMatch (like we do with catalytic activity terms)...I think there may be cases where several MetaCyc pathway variants are effectively all 'exactMatch'...but I'll report back here when I get chance.
GO:0019285 glycine betaine biosynthetic process from choline GO:0051596 methylglyoxal catabolic process GO:0009088 threonine biosynthetic process GO:0015940 pantothenate biosynthetic process GO:0019643 reductive tricarboxylic acid cycle GO:0006099 tricarboxylic acid cycle GO:0030207 chondroitin sulfate catabolic process GO:0030209 dermatan sulfate catabolic process GO:0005992 trehalose biosynthetic process GO:0005993 trehalose catabolic process GO:0019653 anaerobic purine nucleobase catabolic process GO:0009686 gibberellin biosynthetic process GO:0019272 L-alanine biosynthetic process from pyruvate GO:0019277 UDP-N-acetylgalactosamine biosynthetic process GO:0015937 coenzyme A biosynthetic process GO:0016132 brassinosteroid biosynthetic process GO:0006055 CMP-N-acetylneuraminate biosynthetic process GO:0006048 UDP-N-acetylglucosamine biosynthetic process GO:0036172 thiamine salvage GO:0006046 N-acetylglucosamine catabolic process GO:0005980 glycogen catabolic process GO:0006561 proline biosynthetic process GO:0019358 nicotinate nucleotide salvage GO:0005978 glycogen biosynthetic process GO:0009051 pentose-phosphate shunt, oxidative branch GO:0009052 pentose-phosphate shunt, non-oxidative branch GO:0006065 UDP-glucuronate biosynthetic process GO:0006633 fatty acid biosynthetic process GO:0019592 mannitol catabolic process GO:0019465 aspartate transamidation GO:0010127 mycothiol-dependent detoxification GO:0009257 10-formyltetrahydrofolate biosynthetic process GO:0006526 arginine biosynthetic process GO:0033386 geranylgeranyl diphosphate biosynthetic process GO:0006032 chitin catabolic process GO:0061611 mannose to fructose-6-phosphate catabolic process GO:0033261 obsolete regulation of S phase GO:0006636 unsaturated fatty acid biosynthetic process GO:0019551 glutamate catabolic process to 2-oxoglutarate GO:0042840 D-glucuronate catabolic process GO:0019431 acetyl-CoA biosynthetic process from ethanol GO:0035269 protein O-linked mannosylation GO:0008654 phospholipid biosynthetic process GO:0008295 spermidine biosynthetic process GO:0033358 UDP-L-arabinose biosynthetic process GO:0019676 ammonia assimilation cycle GO:0019544 arginine catabolic process to glutamate GO:0019545 arginine catabolic process to succinate GO:0016612 molybdenum-iron nitrogenase complex