geneontology / pathways2GO

Code for converting between BioPAX pathways and Gene Ontology Causal Activity Models (GO-CAM)
8 stars 0 forks source link

update transport template and rule #102

Closed goodb closed 4 years ago

goodb commented 4 years ago

We have the rule: If a reaction has no curated GO Molecular Function and the input entities are the same as the output entities and the input entities have different locations from the output entities Then add type 'transporter activity' GO:0005215 (protein transporter activity GO:0140318 if input and output entities are protein)
And Add: ‘has target end location’, ‘has target start location’, and ‘transports or maintains localization of’ assertions to the function node.

Change this to remove the constraint that the reaction is untyped so that we can capture the start and end location for known transport reactions. When a reaction already has a manually asserted type, do not change it. When it does not, apply the rules as above. Always capture the start/end location assertions.

goodb commented 4 years ago

@ukemi @deustp01 when trying this out, I ran into a case where we are asserting an occurs_in statement based on direct Reactome statement about the location of the reaction (not its participants as would be typical), but we have different locations for inputs and outputs. Let me know what you think about the following conversion of https://reactome.org/PathwayBrowser/#/R-HSA-70326&SEL=R-HSA-170796&PATH=R-HSA-1430728,R-HSA-71387

(note the Cellular Compartment annotation and the locations of the inputs and outputs) Screen Shot 2020-09-09 at 2 19 32 PM

to Screen Shot 2020-09-09 at 2 17 16 PM

goodb commented 4 years ago

This actually seems pretty common. I'm seeing 620 reactions with this structure in their conversion. It happens a lot in the gluconeogenesis pathway e.g. Screen Shot 2020-09-09 at 2 26 38 PM

Are we okay with this ? ping also @vanaukenk

ukemi commented 4 years ago

An age-old question that I don't think we have ever addressed sufficiently. The succinct answer is that I think this is consistent with past/current annotation practice as well. For example, tyrosine kinase receptors in the plasma membrane bind a ligand extracellularly and catalyze the kinase reaction on the cytosolic side of the membrane. Curators annotate these to the plasma membrane. I think for now, we live with this until we figure out a global solution. The examples above all make sense from my biologist viewpoint. At least where the reaction occurs is 'between' the start and end locations even though that's not specified. I would vote that this is ok. @deustp01 @vanaukenk ? Do they pass the ShEX and all logic spatially disjoint rules?

ukemi commented 4 years ago

Wouldn't it be cool to have the spatial representation good enough to infer adjacency, or to assert adjacency and then QC check the annotations? It would work for two out of the three examples above. The mitochondrial one would fail because the cytosol is not adjacent to the inner mitochondrial membrane. Mitochondrial intermembrane space would work. But ultimately it is physiologically important that the malate ends up in the cytosol as in the Reactome pathway. I think we would have to invoke the outer membrane pore.

ukemi commented 4 years ago

The mitochondrial one is also interesting because it is an antiporter. The malate goes out and the phosphate comes in. We missed the phosphate transport part.

goodb commented 4 years ago

@ukemi it does pass the shex and OWL checks (note the green and no red dot!).

ukemi commented 4 years ago

I probably could have done that myself, guilty of laziness. :)

goodb commented 4 years ago

Not this time. That is on my local copy as this rule update isn't live yet.

goodb commented 4 years ago

Looks like no opposition, rule is in place and products are passing validation. Closing.

goodb commented 4 years ago

This rule is causing a shex validation failure in the Citric Acid Cycle pathway R-HSA-71403 and several others . Since NAD(P)+ transhydrogenase activity is not a subclass of transporter activity, it is being tested against the Molecular Function shape which does not contain has_target_start/end_location properties. (Here the reaction seems to be moving H+ ).

@ukemi @vanaukenk should we change the shex schema, change the ontology or change the rule ? I see that Biological Process allows these transport type properties, but Molecular Function does not.

Screen Shot 2020-09-17 at 4 48 45 PM

Same problem for Regulation of Glucokinase by Glucokinase Regulatory Protein R-HSA-170822 Screen Shot 2020-09-17 at 4 55 15 PM

And for Mitochondrial protein import R-HSA-1268020 Screen Shot 2020-09-17 at 4 56 22 PM

ukemi commented 4 years ago

This is actually kind of cool. I think each one is a little different and results in a change the ontology and a change the rule (@goodb don't you love how we get down into the weeds?). @deustp01 and @vanaukenk , what do you think about the representation of the biology here.

Example 1: I'm pretty sure that complex I, (NNT) would be considered a proton pump as well as having the asserted catalytic activity. If all NAD(P)+ transhydrogenase activity is_a GO:0015078 proton transmembrane transporter activity, we are missing a parent in the ontology. I would be conservative and say no to this. We do have NAD(P)+ transhydrogenase (B-specific) activity as a child of both NAD(P)+ hydrogenase activity and protontransmembrane transporter activity but I don't think the mechanism is right or at least I don't know enough about it. Maybe Rhea can help us here. Check out what they have: https://www.rhea-db.org/reaction?id=47992. This is EC:7.1.1.1. I can't find this EC in GO. New term and reannotation? If so, this would be a good example of how this project can result in ontology improvement.

Example 2: I am pretty sure that the we had decided that the nuclear pore was indeed the complex enabling the transport. Would it make sense to have 'structural constituent of the nuclear pore' be a part_of transporter activity? Functional parts of MFs are something that have come up recently.

Example 3: I think this one points to a flaw in the rule. Inserting something into a membrane would not be considered transport, so even though the start and end locations are different, since the end location is a membrane I think we shouldn't infer transport. Would it be safe to infer membrane insertase activity if the end location is a membrane? The def specifies that it has to be inserted from the inside. Is that too restrictive? This is a good catch.

goodb commented 4 years ago

@ukemi it looks like this rule results in shex failures for 28 of the models. In total, the current conversion results in 32 models that fail to validate (none fail OWL). Since this is obviously fun for you and I don't know if we want to push these into dev yet, here are some more screenshots of examples to look at. If you want to look at this live today we could do a screenshare.

Screen Shot 2020-09-18 at 10 53 03 AM Screen Shot 2020-09-18 at 10 52 29 AM Screen Shot 2020-09-18 at 10 51 41 AM Screen Shot 2020-09-18 at 10 50 45 AM Screen Shot 2020-09-18 at 10 49 40 AM Screen Shot 2020-09-18 at 10 48 32 AM

goodb commented 4 years ago

Example 3: I think this one points to a flaw in the rule. Inserting something into a membrane would not be considered transport, so even though the start and end locations are different, since the end location is a membrane I think we shouldn't infer transport.

Lets clarify what you mean by 'infer transport'. In the OWL sense here, we are not currently inferring that the activity instance is a transporter activity. (transporter activity has no logical definition hence that inference can't happen) and our rule isn't asserting it directly either. As we don't make that classification inference, is there harm in adding the property assertions about the movement of Cargo of SAM50 from the mitochondrial intermembrane to the mitochondrial outer membrane? Is that information false? If not, why not keep it?

The text definition of transporter activity is : "Enables the directed movement of substances (such as macromolecules, small molecules, ions) into, out of or within a cell, or between cells." Isn't moving from one side of a membrane to the other 'movement within a cell' ? Could there be a logical definition here that used the has target_start/end and the CC to be more specific here?

ukemi commented 4 years ago

Argh. Sorry totally missed the point. I was just assuming we were assigning a transporter activity here, but we aren't. In some cases these should be transporters I think. I think we need to go through them case by case. I need to think about this more, but it makes more sense to me to not to use the 'has_target...' relations here, because I think we always meant for those to be restricted to things being transported. So in at least some of these cases 'transports or maintains localization of' seem incorrect. The metallopeptidase, for example isn't transporting anything. It's just cleaving off a piece of APP and letting it float away . The motor one and the membrane insertase are interesting. They fit the textual definition of a transporter, but I don't think we would think of them as transporters. This might be a topic for discussion on an ontology call. I'm going to have to look into the biology of the CAM kinase. Seems to me like the reaction encompasses more than just the CAM kinase activity and it it really a piece of a trafficking process. Reading the Reactome description seems to support this view. But I see what you are saying, this information is there... can't we capture it? In some ways maybe we would be safer putting the locations on the inputs and outputs like Reactome does (located_in). We would only use 'transports or maintains localization of' and has_target_start... has_has_target end for transporter activities for transport processes. That's where we diverged from the strict rule and it got us into trouble in some of the cases above. We are invoking transport relations on reactions that aren't inferred or asserted to be transport. Of course it also identified some cases where terms in the ontology might need a bit of work.

The reason I think these are fun is because I think they are concrete questions about our rules and the biology. What did we say? What was the rule? What is the biology? Why did we break the Shex? Is the rule wrong, the Shex too restrictive, biology not quite right in the model, biology not quite right in the ontology?

We do use target start and end locations in the transport processes, which are sometimes differentiated by places, the transporter functions are differentiated by what they transport, so the place is not specified.

@vanaukenk and @deustp01 (if you can come up for air), what are your thoughts on insertases, molecular motors and the complexes in the electron transport chain (a chemical misnomer). Should they be transporters according to the GO definition?

deustp01 commented 4 years ago

thoughts on insertases, molecular motors and the complexes in the electron transport chain

(gasping for air) Immediate thought in reaction only to that last question is that all of these entities do complicated things that would need multiple MF terms to be described fully. In all cases, it does look like a substrate entity changes location as one feature of what is happening, so a transport MF term would be part of the description.

ukemi commented 4 years ago

Welcome back to the world of the living. I concur with all but maybe the insertases, but I bet you know more about the mechanism than I do. This means we should take this up with the ontology group. But I still think it is unsafe in all cases to use a results_in_transport_of etc for all reactions where reactants change location. We suffer from this limitation because of our ability to go to the detail of the reactants, at least for the APP example. The entity that is released after cleavage is a different entity than the input. It is a fragment of the protein that is cleaved. This is where the information loss limits us to a certain extent.

deustp01 commented 4 years ago

maybe the insertases

Actually don't know anything about these, only speculating that insertion might mean moving an entity from a location separate from a membrane to a location within the membrane.

ukemi commented 4 years ago

That is what they do, so yes they fit the definition. I will look back to the notes when the ontology group was working with the transporter experts to see if they were ever discussed. What do you think about the CAM kinase? Have a look at the Reactome pathway when you get a chance. Should it be expanded?

goodb commented 4 years ago

@ukemi lets try to get a concrete decision here specifically on the relationship between the shex and the rules for the conversion. This is what I am hearing from you:

  1. If the existing MF assigned by reactome is not a subclass of transporter activity, do not add the ‘has target end location’, ‘has target start location’, and ‘transports or maintains localization of’ assertions to the activity node because sometimes these are not appropriate. If there is no term assigned, add transport activity and proceed as before. If the existing term is a subclass of transport activity go ahead and add the transport relations.
  2. Leave the shex alone because it is actually doing exactly what you want it to do.
  3. You may want to engage the ontology group in regard to some changes of class definitions in this area of the ontology based on some of your comments above.

I would add that I think you should try to add a logical definition to transport activity and/or to these three RO terms that makes the implicit knowledge you applied to this thread explicit in the ontology.

ukemi commented 4 years ago

@goodb, yes this is a correct summary of the tasks. Hopefully it's ok. But this exercise points out future strategies that we can use to find areas of these models that need examination. One thing that I have been meaning to ask you for a while. It is my interpretation that the Shex works only on assertions. Is there a way we can make it work on the inferences? I've run across a couple of cases where it would be useful.

One thing I did realize last night is that we can't make the class-level relationships between motors and transporter activities. Although sometimes microtubule and microfilament motors move things from place to place and are involved in the process of transport, ciliary dynein and muscle myosin are motors but don't transport anything. They serve as the motors for filament sliding, but not transport. I don't know why I didn't think of this yesterday considering my past.

"I would add that I think you should try to add a logical definition to transport activity and/or to these three RO terms that makes the implicit knowledge you applied to this thread explicit in the ontology."

I'd love to talk to you about this in more detail. I've wanted to do this with lots of the 'macro' relations in the ontology for a long time. For the transport ones, I can't figure out how to specify one generic place from another generic place, ie two different places. I don't think it's possible in obo, but probably is in OWL.

goodb commented 4 years ago

@ukemi

It is my interpretation that the Shex works only on assertions. Is there a way we can make it work on the inferences? I've run across a couple of cases where it would be useful.

The shex check is executed on the inferred models. This is actually required to make it work because it is how we match the shapes (e.g. MolecularFunction shape) to the instances in the graph that they should match (e.g. any instance typed as a child of Molecular Function). If there is an example where this doesn't seem to be happening, please let me know so I can look into it.

logical definition to transport activity

I can't think of a way to say that the start/end locations must be different places in OWL off the top of my head. May be impossible without using a rule. But, I think you could make a useful definition for the high level transporter activity without that declaration. e.g. molecular_function and ('transports or maintains localization of' some 'chemical entity') and ('has target start location' some 'cellular anatomical entity') and ('has target end location' some 'cellular anatomical entity')

You could even leave off the start/end location and just use the transports or maintains... property.
Children of transporter activity could then specify the start/end bits as needed.

ukemi commented 4 years ago

@goodb. This proposal for transporter makes perfect sense. I suspect it wasn't implemented originally because we didn't have the cellular anatomical entity class. https://github.com/geneontology/go-ontology/issues/20001

goodb commented 4 years ago

Closing - all models behaving as desired above. the transport properties are only added when the class of the activity is a subclass of transporter activity.