geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

YeastPathways - Intermediate small molecules shared between reactions #258

Closed dustine32 closed 9 months ago

dustine32 commented 1 year ago

For YeastPathways models, any small molecules that are outputs of one reaction and also an input of the next reaction should share a single small molecule instance. Example in BRANCHED-CHAIN-AA-SYN-PWY-1 pathway shown here: image Reaction ACETOLACTREDUCTOISOM-RXN has output molecule (R)-2,3-dihydroxy-3-methylbutanoate instance that is separate from DIHYDROXYISOVALDEHYDRAT-RXN input (R)-2,3-dihydroxy-3-methylbutanoate. These small molecule instances of the same class can be merged into one instance and connected via a -has_output-> X <-has_input- chain as shown here: image

The following small molecule classes will explicitly be blocked from sharing instances between reactions: CHEBI:15378 hydron CHEBI:15377 water

Tagging @thomaspd for any further clarification.

This has already been coded in branch intermediate-mol-share-instance as work to see if this was easy enough to do in the pathways2go code. Turns out it was easy. The code just needs to be merged into master as part of the YeastPathways load.

ukemi commented 1 year ago

To maintain consistency, can we do this with the Reactome models too? @deustp01

dustine32 commented 1 year ago

@ukemi Yes! We can definitely do this for Reactome. It's actually currently restricted to YeastPathways right now only because applying this to Reactome breaks one of the pathways2go Reactome tests and I haven't been able to debug why (after spending almost 2 days on it a few months ago).

ukemi commented 1 year ago

Weird because we actually use the outputs and inputs in the rule to infer 'provides_input_for'.

deustp01 commented 1 year ago

The following small molecule classes will explicitly be blocked from sharing instances between reactions: CHEBI:15378 hydron CHEBI:15377 water

This list will probably need to be expanded, e.g., to ATP / ADP / Pi and probably more. A couple of years ago, in a similar dicussion, I think we used the term "currency chemicals" (Larry Hunter's phrase?) for these ubiquitously occurring entities that we do NOT want to be the basis of a causal connection between reactions. But cautiously, because the boundary between "currency" and meaningful shared entities is fuzzy and variable.

ukemi commented 1 year ago

currency chemicals? We tried to distill this with Ben and Alan using frequency of use in Rhea reactions. I still have the list.

deustp01 commented 1 year ago

But cautiously, because the boundary between "currency" and meaningful shared entities is fuzzy and variable.

Maybe we could combine a fairly extensive currency chemical list with an additional rule that when reaction 2 is asserted to directly follow reaction 1, any outputs of 1 that are inputs of 2 can be used to make "directly provides input for" links between the two reactions even if the chemicals are on the currency chemical list. With a lot of checking to tune the rule and the list to exclude false positive links.

dustine32 commented 1 year ago

Weird because we actually use the outputs and inputs in the rule to infer 'provides_input_for'.

Yeah, some of this logic may be behind what's failing the test.

ukemi commented 1 year ago

Yes, this must be done cautiously. For example, I am currently working on some metabolic pathways that generate and consume co-factors. If I were including chemicals, I certainly wouldn't want to exclude them.

ukemi commented 1 year ago

@dustine32 I also notice above that you are using the relation 'directly provides input for'. It is my understanding that this is now subsumed by just 'provides input for'.

suzialeksander commented 9 months ago

I think this is done, at least for YP. If there needs to be a change for Reactome, that should probably be a new ticket.

deustp01 commented 9 months ago

This discussion may also be a useful starting point for thinking about the issue of easily / reliably identifying primary inputs in GO-CAMs and distinguishing them from other inputs as discussed here @ukemi @pgaudet