geneontology / go-annotation

This repository hosts the tracker for issues pertaining to GO annotations.
BSD 3-Clause "New" or "Revised" License
32 stars 10 forks source link

How to represent binding of a gene product in LEGO: #2280

Open pgaudet opened 5 years ago

pgaudet commented 5 years ago

From @dosumis on February 8, 2017 13:53

When a gene product binds to another gene product, do we need to choose one side as enabling?

Should we do this:

  GP1 -enables-> binding <-enables- GP2 
    ^__has_input___| |___has_input___^

Or this:

 GP1-enables->binding-has_input->GP2

 GP2-enables->binding-has_input->GP1

The former is potentially useful in some templates - e.g. defining cell-adhesion mediator activity

_Copied from original issue: geneontology/molecular_functionrefactoring#29

pgaudet commented 5 years ago

From @dosumis on February 8, 2017 15:26

Note - in annotation the binding partner is typically (traditionally) buried in the with statement (yuk).

pgaudet commented 5 years ago

From @cmungall on February 9, 2017 6:47

Did you mean there to be 3 options here? Or is the second option a single model with distinct direction-specific instances of the binding process?

Can this be simplified with a swrl rule?

e.g.

?x enables ?p
?p type binding
->
?p has-input ?x
pgaudet commented 5 years ago

From @dosumis on February 9, 2017 7:3

Did you mean there to be 3 options here? Or is the second option a single model with distinct direction-specific instances of the binding process?

The latter. without stating both directions one of the partners will not get an annotatino.

Can this be simplified with a swrl rule?

e.g.

?x enables ?p
?p type binding
->
?p has-input ?x

So you'd only have to state the enables link to get the implication of has_input. I like it.

CC @thomaspd @vanaukenk @ukemi

pgaudet commented 5 years ago

From @dosumis on February 11, 2017 13:32

- [ ] TODO: add swrl rule as detailed above.

Where should these live - RO or GO? if GO, presumably we need a new file for rules?

pgaudet commented 5 years ago

From @dosumis on February 16, 2017 20:59

Hmmm... Just thought. Given this equivalent class axiom, the rule would end up inferring that all binding is protein binding:

image

Strictly, this may reflect the limitations of our binding design pattern rather than a mistake in the logic: an active participant (mediator) of a process (in this case the gene product) is arguably an input to that process. Still, it may be best to stick with the two node pattern for annotation in these cases.

pgaudet commented 5 years ago

From @dosumis on February 19, 2017 11:25

The confusion of what is correct here comes from treating MFs as processes but calling them functions. Each gene product has its own binding function, but those functions are simultaneously realized in a single binding process. (This is not to say that we should start treating MFs as 'realizables', but this framing makes the problem clearer).

pgaudet commented 5 years ago

From @dosumis on February 20, 2017 18:5

Possible solution:

Change logical definition of protein binding.
In DL we could say binding that has_input > 1 protein EL shunt pattern: Add qualifier to indicate this: multiprotein 'protein binding' bearer_of some 'multiprotein' GCI: bearer_of multiprotein equivalentTO has_input > 1 protein GCI: bearer_of multiprotein subClassOf has_input some protein

pgaudet commented 5 years ago

From @dosumis on February 20, 2017 18:12

CC @cmungall - would be good to chat about this.

pgaudet commented 5 years ago

From @cmungall on February 20, 2017 19:22

Easy part first: I think SWRL rules should live in RO, separate module, but imported by default. That makes them most amenable to global consistency checks with other relations.

Harder part: good catch and I agree with the analysis.

Can we first explore your original option number 2. (I don't know what the cell adhesion mediator story is and how that fits in).

Is it the case that there is always 2 complementary binding processes? I am imagining two scenarios:

  1. it is the function of p1 to bind with proteins such as p2, and the function of p2 to bind with proteins such as p1
  2. it is the function of p1 to bind to p2 (for example, to disable it, in the case of foreign proteins, or simply proteins that are ubiquinated)

Here I use 'bind' in the process sense, and 'function' as shorthand for evolved-to-do.

In case 1, we would place two activities in the lego model. In case 2, only one.

It seems it may be difficult to tease apart these scenarios. But the ability to tease them apart could be very useful.

If we decide to go with original option number 1, then your solution should work in theory, but is not totally straightforward. The el-shunt will help us with TBox reasoning, but for ABox(LEGO) reasoning we need to actually count distinct proteins. I'm not even sure if this is possible with a SWRL rule. There are subtleties here to do with the unique name assumption (which we implicitly make). We could have something like a SPARQL rule that makes the UNA and injects the PATO quality.

Oh, and what about RNAs whose function is to bind a protein?

I think I am tending towards your option2. It's how I think of this naturally, FWIW. There are some counter-intuitive aspects. E.g. if we consider has_input as a subprop of has_participant (we could model it differently) then the two binding processes b1 and b2 are spatiotemporally identical. However, they are differentiated by their enablers; different views of the same process. Crudely, I have an analogy with a fight between two people (not sure where that came from). It's IMO more useful to look at this as two coincident processes from two different perspectives, each has different properties.

pgaudet commented 5 years ago

From @thomaspd on February 20, 2017 20:12

I tend to agree that option 2 is the better choice. David OS and I discussed this today and he had a good point, namely that anything we can do to reduce the number of nodes in the graph is helpful. I think this is right, but I note that only one of the two directional nodes will generally be a "function" node in a LEGO graph. The reverse direction node is usually only a subfunction of the overall downstream MF.

pgaudet commented 5 years ago

From @ukemi on February 20, 2017 20:24

I agree with Paul, and it is consistent with the way I have been modeling. The function of the binding tends to be from a given perspective of an active participant/enabler. I think we have done some of these at the various workshops. In the old annotation paradigm, we always made the reciprocal binding annotation, with the caveat that the actual binding partner went in the 'with' field. If it was a mouse protein binding a human protein we would make the mouse annotation and the human protein went in the 'with' field. We didn't/couldn't make the reciprocal annotation for the human protein.

pgaudet commented 5 years ago

From @dosumis on February 21, 2017 11:52

Having one node makes some important inference easier.

See: https://github.com/geneontology/molecular_function_refactoring/blob/master/direct_reg_inf_notes.md

pgaudet commented 5 years ago

From @dosumis on February 24, 2017 4:16

Linking protein binding effector to sensor:

image

In this case, we need some association between the two protein binding nodes in order to keep a continuous chain of regulates relations (essential for inference).

Perhaps, rather than a new relationship for '?', noctua should have something like scratch - where two compatible nodes can 'snap' together.

CC @cmungall @ukemi

pgaudet commented 5 years ago

@ukemi @vanaukenk Is there anything left to do here ? Seems like there is some agreement. (Do we need to document?) If not, can you please close ?

pgaudet commented 5 years ago

For GO:0005515 protein binding: @thomaspd proposes that we use 'has input' for both proteins (no 'enables') Otherwise this 'overloads' the 'enables' relation, since both proteins participate sort of equally.

ukemi commented 5 years ago

We should also examine this proposal in the context of the Reactome imports. When complexes form, these are binding events.

vanaukenk commented 5 years ago

As illustrated in some of our GO-CAMs, protein binding is the actual 'activity' of some gene products, but that's not how you'd describe the activity of the gene product being bound, and I'm not even sure we want to say that it's a sub-function of the gene product being bound.

For example, GP1 that binds GP2 for the purposes of sequestering GP2 would get an annotation to 'protein binding' and hopefully also to 'inhibition of GP2 activity' via reasoning, but I don't think we'd want to make the reciprocal binding annotation from GP2's point-of-view in the GO-CAM.

If we want the reciprocal binding annotation in the GPAD, then I think that's something to incorporate in the GPAD output script.

Some specific examples for reference are:

http://noctua.geneontology.org/editor/graph/gomodel:5b91dbd100002880 http://noctua.geneontology.org/editor/graph/gomodel:5b528b1100000186

There is also a related issue about handling protein binding issues in the Noctua tracker that is relevant to this discussion: https://github.com/geneontology/noctua/issues/552

suzialeksander commented 5 months ago

@vanaukenk can this be closed, or is there still something to change in the GPAD script (if so this may not be the right repo)?