geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

How to represent complex active unit structure #91

Closed goodb closed 4 years ago

goodb commented 4 years ago

After reviewing the conversion rules for the GOC meeting, I am wondering about the active-unit representation. Would it be better to take the instance of the complex out of the model and just keep the protein? The part-of relationship between the protein instance and the complex instance is redundant with the definition of the complex in REACTO (someday PRO). And, like I noted in the talk, it is the only place in the conversion where this kind of information about a physical entity is captured in the instance graph.

input: Screen Shot 2020-05-16 at 3 32 57 PM

output (instance graph for model) Screen Shot 2020-05-16 at 3 24 12 PM

output (REACTO class definition for the complex):

Screen Shot 2020-05-16 at 3 39 52 PM

I think removing the complex would make for cleaner models and a more consistent representation.

@ukemi @deustp01 @vanaukenk @thomaspd ??

ukemi commented 4 years ago

If I understand correctly, in this model since we have an active subunit we would remove the complex and just make the protein-serine threonine phosphatase activity enabled by PP2A. I am not opposed to this and at first glance think it is more of a GO-centric view. It means that we would lose the complex details in the models, but maybe that's ok. What would happen in the cases where we don't know the active subunit. Would we keep the complex? I'd like to explore this idea and think about how it fits with the contributes_to proposal that was made at the meeting. In my mind I'm still thinking about how we will align the proposal at the meeting with the Reactome representations for non-catalytic subunits of a complex. Will we need to go in and assign MFs (bridging, adapter etc) for those subunits in Reactome? What will the subsequent model look like. Let's have a think about this for our next call, scheduled for the 27th, but I will likely be on vacation.

deustp01 commented 4 years ago

And if I understand right, the fact that the active unit is part of a complex (and in vivo needs to be part of that complex to function properly) is still preserved someplace via the REACTO (future PRO) ontology? And anyway it's in Reactome, accessible anytime anyone figures out how to use it.

The what-if-we-don't-know-the-active-subunit case is an ugly can of worms. If the threshold for GO-CAM is that the active unit must be known, we lose a vast amount of information that everyone agrees is in scope for GO now. But without that threshold we create two classes of entities: intact complexes, and complexes represented only by their active units with the differentium being an accident of curation.

But also, as the full information is preserved, and we really want to generate consistent GO-CAM models now, a lossy compromise that can work now, to be followed by work to try to capture a larger fraction of the functions of all of the gene products that make up these complexes, is a workable way to go.

goodb commented 4 years ago

The idea is to take the instance of the complex out of the model when there is an active unit annotation and just use the indicated protein instance. When there is no active unit specified, the complex itself is set as the enabler of the reaction. That latter part is not a change.

The membership of the active protein in the complex is captured in REACTO and thus could be used - e.g. to generate 'contributes to' or other relationships.

I like this as it avoids a redundant representation of the complex has_part protein relationship in both the model and the ontology. I also like it because it keeps the approach to entity representation in line with the rest of the model - in terms of the divide between tbox (classes) and abox (instances).

I think that putting e.g. complex has_part protein relationships into these models is an indication of a weakness in the entity tbox that ought to be corrected there. Otherwise we are going to have the same information spread haphazardly across different models and ontologies in slightly different forms. This might open up the complex complex can of worms yet again...

ukemi commented 4 years ago

Yup. I like the proposal. Let's float it by the other on the call to try to get approval.

cmungall commented 4 years ago

This proposal makes complete sense to me. No information is lost, the display is also easier to parse.

An analogy is for reactions. We don't place instances of the chemical instances in the graph usually, as these are in the ontology (or will be when rhea task completed). All the information is still there

deustp01 commented 4 years ago

The analogy to chemical instances in reaction graphs works really well for me to show the level of abstraction that is operating here and why it works.

goodb commented 4 years ago

@cmungall quick confirmation question about your comment. For the reactome conversion and I think in general, we currently do add instances of chemical classes for use as inputs and outputs of activity nodes. If the GO MF type term for the activity node has a logical definition that includes the chemicals in the reaction, then yes it seems redundant in the same way as here with the complex parts. That suggests that we may want to work on some coordination with regards to input/output and logical definitions for MF terms to avoid redundancy and conflicts. I suppose this is the case in general for any terms with logical definitions..

ukemi commented 4 years ago

At some point in the (near) future, we will be leveraging off of Rhea to populate all the logical definitions for enzymatic reactions. This is another step towards the full Rhea-GO-Reactome alignment.

goodb commented 4 years ago

consensus achieved.. may 27 2020

goodb commented 4 years ago

Noting a little twist to account for with e.g. R-HSA-4641262 https://reactome.org/PathwayBrowser/#/R-HSA-201681&SEL=R-HSA-201677&PATH=R-HSA-162582,R-HSA-195721

Dropping the complex that intervenes with active unit between 'DVL recruits GSK3beta:AXIN1 to the receptor complex' and 'Phosphorylation of LRP5/6 cytoplasmic domain by membrane-associated GSK3beta' results in the loss of the inference that the former positively regulates the latter.

goodb commented 4 years ago

@deustp01 This seems like an error in Reactome? In R-HSA-3134946 it looks like the TRIM21 protein is the active unit in the TRIM21 protein. https://reactome.org/PathwayBrowser/#/R-HSA-1834949&SEL=R-HSA-3134946&PATH=R-HSA-168256,R-HSA-168249

deustp01 commented 4 years ago

@deustp01 This seems like an error in Reactome?

Yes - a curation mistake. Now fixed in internal database, a few days too late to get into the June release so it will become visible in September.

This should be a rare error, but I guess the same search that we should be able to do on the Reactome side to find catalystActivity instances with complexes as their physicalEntity and no active unit annotation to flag the catalyst protein in the complex (and also ones whose physical entity is a homomultimer) could also be run to find catalystActivity instances whose active unit slot is not null, as in reaction R-HSA-3134946 .