Open pgaudet opened 5 years ago
From @dosumis on February 8, 2017 15:26
Note - in annotation the binding partner is typically (traditionally) buried in the with statement (yuk).
From @cmungall on February 9, 2017 6:47
Did you mean there to be 3 options here? Or is the second option a single model with distinct direction-specific instances of the binding process?
Can this be simplified with a swrl rule?
e.g.
?x enables ?p
?p type binding
->
?p has-input ?x
From @dosumis on February 9, 2017 7:3
Did you mean there to be 3 options here? Or is the second option a single model with distinct direction-specific instances of the binding process?
The latter. without stating both directions one of the partners will not get an annotatino.
Can this be simplified with a swrl rule?
e.g.
?x enables ?p
?p type binding
->
?p has-input ?x
So you'd only have to state the enables link to get the implication of has_input. I like it.
CC @thomaspd @vanaukenk @ukemi
From @dosumis on February 11, 2017 13:32
- [ ] TODO: add swrl rule as detailed above.
Where should these live - RO or GO? if GO, presumably we need a new file for rules?
From @dosumis on February 16, 2017 20:59
Hmmm... Just thought. Given this equivalent class axiom, the rule would end up inferring that all binding is protein binding:
Strictly, this may reflect the limitations of our binding design pattern rather than a mistake in the logic: an active participant (mediator) of a process (in this case the gene product) is arguably an input to that process. Still, it may be best to stick with the two node pattern for annotation in these cases.
From @dosumis on February 19, 2017 11:25
The confusion of what is correct here comes from treating MFs as processes but calling them functions. Each gene product has its own binding function, but those functions are simultaneously realized in a single binding process. (This is not to say that we should start treating MFs as 'realizables', but this framing makes the problem clearer).
From @dosumis on February 20, 2017 18:5
Possible solution:
Change logical definition of protein binding.
In DL we could say binding that has_input > 1 protein
EL shunt pattern: Add qualifier to indicate this: multiprotein
'protein binding' bearer_of some 'multiprotein'
GCI: bearer_of multiprotein equivalentTO has_input > 1 protein
GCI: bearer_of multiprotein subClassOf has_input some protein
From @dosumis on February 20, 2017 18:12
CC @cmungall - would be good to chat about this.
From @cmungall on February 20, 2017 19:22
Easy part first: I think SWRL rules should live in RO, separate module, but imported by default. That makes them most amenable to global consistency checks with other relations.
Harder part: good catch and I agree with the analysis.
Can we first explore your original option number 2. (I don't know what the cell adhesion mediator story is and how that fits in).
Is it the case that there is always 2 complementary binding processes? I am imagining two scenarios:
Here I use 'bind' in the process sense, and 'function' as shorthand for evolved-to-do.
In case 1, we would place two activities in the lego model. In case 2, only one.
It seems it may be difficult to tease apart these scenarios. But the ability to tease them apart could be very useful.
If we decide to go with original option number 1, then your solution should work in theory, but is not totally straightforward. The el-shunt will help us with TBox reasoning, but for ABox(LEGO) reasoning we need to actually count distinct proteins. I'm not even sure if this is possible with a SWRL rule. There are subtleties here to do with the unique name assumption (which we implicitly make). We could have something like a SPARQL rule that makes the UNA and injects the PATO quality.
Oh, and what about RNAs whose function is to bind a protein?
I think I am tending towards your option2. It's how I think of this naturally, FWIW. There are some counter-intuitive aspects. E.g. if we consider has_input as a subprop of has_participant (we could model it differently) then the two binding processes b1 and b2 are spatiotemporally identical. However, they are differentiated by their enablers; different views of the same process. Crudely, I have an analogy with a fight between two people (not sure where that came from). It's IMO more useful to look at this as two coincident processes from two different perspectives, each has different properties.
From @thomaspd on February 20, 2017 20:12
I tend to agree that option 2 is the better choice. David OS and I discussed this today and he had a good point, namely that anything we can do to reduce the number of nodes in the graph is helpful. I think this is right, but I note that only one of the two directional nodes will generally be a "function" node in a LEGO graph. The reverse direction node is usually only a subfunction of the overall downstream MF.
From @ukemi on February 20, 2017 20:24
I agree with Paul, and it is consistent with the way I have been modeling. The function of the binding tends to be from a given perspective of an active participant/enabler. I think we have done some of these at the various workshops. In the old annotation paradigm, we always made the reciprocal binding annotation, with the caveat that the actual binding partner went in the 'with' field. If it was a mouse protein binding a human protein we would make the mouse annotation and the human protein went in the 'with' field. We didn't/couldn't make the reciprocal annotation for the human protein.
From @dosumis on February 21, 2017 11:52
Having one node makes some important inference easier.
See: https://github.com/geneontology/molecular_function_refactoring/blob/master/direct_reg_inf_notes.md
From @dosumis on February 24, 2017 4:16
Linking protein binding effector to sensor:
In this case, we need some association between the two protein binding nodes in order to keep a continuous chain of regulates relations (essential for inference).
Perhaps, rather than a new relationship for '?', noctua should have something like scratch - where two compatible nodes can 'snap' together.
CC @cmungall @ukemi
@ukemi @vanaukenk Is there anything left to do here ? Seems like there is some agreement. (Do we need to document?) If not, can you please close ?
For GO:0005515 protein binding: @thomaspd proposes that we use 'has input' for both proteins (no 'enables') Otherwise this 'overloads' the 'enables' relation, since both proteins participate sort of equally.
We should also examine this proposal in the context of the Reactome imports. When complexes form, these are binding events.
As illustrated in some of our GO-CAMs, protein binding is the actual 'activity' of some gene products, but that's not how you'd describe the activity of the gene product being bound, and I'm not even sure we want to say that it's a sub-function of the gene product being bound.
For example, GP1 that binds GP2 for the purposes of sequestering GP2 would get an annotation to 'protein binding' and hopefully also to 'inhibition of GP2 activity' via reasoning, but I don't think we'd want to make the reciprocal binding annotation from GP2's point-of-view in the GO-CAM.
If we want the reciprocal binding annotation in the GPAD, then I think that's something to incorporate in the GPAD output script.
Some specific examples for reference are:
http://noctua.geneontology.org/editor/graph/gomodel:5b91dbd100002880 http://noctua.geneontology.org/editor/graph/gomodel:5b528b1100000186
There is also a related issue about handling protein binding issues in the Noctua tracker that is relevant to this discussion: https://github.com/geneontology/noctua/issues/552
@vanaukenk can this be closed, or is there still something to change in the GPAD script (if so this may not be the right repo)?
From @dosumis on February 8, 2017 13:53
When a gene product binds to another gene product, do we need to choose one side as enabling?
Should we do this:
Or this:
The former is potentially useful in some templates - e.g. defining cell-adhesion mediator activity
_Copied from original issue: geneontology/molecular_functionrefactoring#29