geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

Reaction intermediates: out of scope? #229

Open nataled opened 1 year ago

nataled commented 1 year ago

This ticket captures and continues a conversation had via email:

@nataled: I somehow recall that we discussed whether or not reaction intermediates were in scope for this project, and further recall (without much confidence) that the decision was 'no', but I can't find anything written on that. Of particular interest at the moment are things like sumoylation processes. These involve an intermediate between the SUMO protein and the ligase complex (with the SUMO attached to the active site cysteine before it is transferred to its final destination). Will GO-CAMs be capturing this level of detail? I ask because PRO currently considers these out of scope, but I can add them if needed.

@deustp01: Reaction intermediates, specifically entities formed by the transient attachment of something derived from a reaction substrate (whether a small molecule or a protein) to the enzyme catalyzing the reaction, are out of scope for Reactome. I expect that if there were physiological conditions (or a disease stare or as the result of the action of a drug) where that intermediate is stable, we'd make an exception but I don't know of any cases like that. David can provide a sanity check on this and speak for GO (but my hunch is that, for somewhat different reasons, they would not try to annotate this transient structure, whether it involved a protein and a small molecule or two proteins.

@ukemi: As far as I know, intermediates would be out of scope for GO/GO-Cam as well.

@nataled: Okay, thanks, I'll filter these out when I come across them in my pipeline. I found about 25 cases so far (of which https://www.reactome.org/content/detail/R-HSA-3730623 is an example).

@deustp01: We should see how in fact we handled this entity in GO-CAMs. Darren, do you have other examples?

@ukemi: Yes, That's a really good idea! Send examples and I will have a look at some.

@nataled: The other examples I have at the moment are all of the same type as the one I gave. That is, they all are SUMO transfer intermediates. As to the answer to David's question, I think it's clear: these are out of scope for Reactome, PRO, and GO-CAM. They should be removed from Reactome, unless...do these intermediates persist through some change in location, perhaps?

@deustp01: It looks like Ben's code successfully uses Darren's example as a physical entity in the GO-CAM for the pathway - http://noctua.geneontology.org/editor/graph/gomodel:R-HSA-3065678, derived from our pathway https://www.reactome.org/content/detail/R-HSA-3065678. This opens the enzymological / editorial can of worms of deciding when a enzyme-substrate conjugate should be classified as an entity in its own right, analogous to the longstanding issue of how stable / persistent a noncovalent interaction among entities needs to be for the interacting structure to be labelled a complex. Old headaches for the new year.

@ukemi: Interesting. From the perspective of reality, this is accurate. The intermediates exist and they are physical entities. But wrt practicality, what do we do? Happy New Year!

@deustp01: Here's an example to think about.

The Stryer textbook has a diagram that shows all the intermediate steps of the reaction mechanism by which a serine protease catalyzes the addition of a water molecule across a peptide bond. Note the existence of intermediates in which all of the substrate polypeptide on one side (I forget which) is covalently attached to the serine in the active site of the protease. A water molecule normally immediately attacks that structure to release the attached peptide fragment, and we (and as far as I know everyone else) do not create a separate physical entity for that conjugate intermediate, but only create a single-step reaction in which polypeptide + water react to yield the two cleavage products, e.g., https://reactome.org/content/detail/R-HSA-140777 from the clotting cascade.

BUT

There are naturally occurring serine protease inhibitors, serpins, that are themselves proteins, that are cleaved to form the usual enzyme:half-substrate covalent conjugate, which then takes on a conformation that blocks access by water and yields a structure whose dissociation constant is measured in weeks. I don't think we've annotated any serpin inhibition reactions, but if we did we would certainly annotate the output of the reaction as the enzyme with its active-site serine covalently modified by addition pf the appropriate piece of the serpin to its -OH group.

The point of this whole oration is that we need an editorial line whose intention is to exclude transient intermediates that have only a single possible fate but allow ones that are stable (but how to define "stable" in a way that would satisfy Paul Thomas?) or that might have several fates (so we'd want to create one reaction to form the conjugate and an additional one for each possible fate).

@ukemi: Would you consider the inhibitor-enzyme complex to be the enabler of the inhibition reaction, so there are actually two reactions, the conjugation reaction enabled by the serpin and then the regulatory reaction enabled by the conjugate? Or maybe this just kicks the can up the road.

nataled commented 1 year ago

With this as a guide: https://reactome.org/PathwayBrowser/#/R-HSA-3215018&SEL=R-HSA-2993793&PATH=R-HSA-392499,R-HSA-597592,R-HSA-2990846 it's clear that, from the perspective of the act of sumoylation (and other UBL proteins attachment processes), these exist as intermediates during the process of rendering UBL proteins competent for conjugation to their final destination protein. That is, these represent entities needed to describe the process of activation.

If we elect to represent these, they'll suffer from the same issue as any other protein-protein conjugation entity in Reactome, namely, that each entity is represented twice, each from the perspective of one of the two proteins. For example:

R-HSA-4419894 P61081 +111=MOD:00211~CHEBI:24411~UniProt:Q15843[76]
R-HSA-4419895 Q15843 +76=MOD:00211~CHEBI:23511~UniProt:P61081[111]

R-HSA-4419894 = G76-NEDD8-C111-AcM-UBE2M [cytosol]
R-HSA-4419895 = C111-AcM-UBE2M-G76-NEDD8 [cytosol]
P61081 = NEDD8-conjugating enzyme Ubc12
Q15843 = NEDD8
MOD:00211 = S-(glycyl)-L-cysteine (Cys-Gly)
CHEBI:24411 = glycyl group
CHEBI:23511 = cysteinyl group

These both represent one entity, which is an intermediate in the activation of NEDD8 (cytosol). During this process, the C-terminal glycine of NEDD8 is attached to the active site cysteine of NEDD8-conjugating enzyme Ubc12 (P61081).

Previously we decided that only one perspective will be shown, which is that of the protein being modified. In the above situation, if we represent these, I would suggest that we only 'follow' the protein being activated (NEDD8 in the example). This has the advantage of not breaking all my error-checking code that says only lysines can be neddylated (because from the activating enzyme's perspective, a cysteine has the attachment).

Bottom line question: will the activation steps for NEDD8 and other UBLs be represented in GO-CAMs?