SynBioDex / SBOL-visual

The reference implementation of the SBOL Visual standard
Other
32 stars 16 forks source link

SEP V018: Interactions with Interaction Nodes #73

Closed jakebeal closed 3 years ago

jakebeal commented 4 years ago

Many diagrams in practice include "interactions with interactions." This SEP establishes the circumstances in which such a form is allowed.

https://github.com/SynBioDex/SBOL-visual/blob/master/SEPs/SEP_V018.md

chofski commented 4 years ago

Should "Cas9m" read "dCas9" instead. I assume this is a catalytically dead variant if repressing a promoter?

jakebeal commented 4 years ago

Cas9m is a specific form of dCas9 (c.f. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4228775/), but I have now changed it to dCas9 for simplicity of understanding.

JS3xton commented 4 years ago

In the proposed Figure 19 (a), shouldn't the Process node be a square glyph and not a circle glyph?

Also, the following specification confuses me:

If there is precisely one incoming edge without a head and precisely one outgoing edge, then the process node MAY be omitted, but otherwise MUST NOT be omitted. Examples are provided in Figure 19.

What makes an edge incoming or outgoing? Is this even necessary? Why can't Process node omission be orthogonal to the number of edges involved in the process?

And are Fig 19 (c) and (d) illustrating MUST NOT practices? I would hope that's clear in the final version like it is throughout the rest of the specification. It also might help to show correct versions of the diagrams alongside the incorrect versions.

jakebeal commented 4 years ago

Good catch, @JS3xton - that was indeed supposed to be a square glyph; I've fixed it.

What makes an edge incoming or outgoing? Is this even necessary?

Incoming and outgoing are defined by the usual head/tail conventions for a graph: if the head is at the node, it's incoming; if the tail is, it's outgoing. Figure 19(c) and 19(d) show the ambiguities that you get if you omit with other cardinalities: if you have more, then is it a superposition or a process? if you have less, then what is being acted on?

And are Fig 19 (c) and (d) illustrating MUST NOT practices?

Yes, and they are clearly marked so on the draft spec change if you want to look at the SEP-V018 branch: 457884e

JS3xton commented 4 years ago

On the edges, I figured as much, which leads me to more confusion: you require above an "incoming edge without a head", yet you define incoming edges as edges where the head is at the node, so how can an incoming edge not have a head? It's also unclear to me how you categorize a single edge that has been bisected by a node. Does that original edge become two edges? For example, how many edges are associated with the red square Process node in Fig 19 (a)?

All this to say: I think it's easier to avoid explicitly defining these relationships if they're not necessary.

On the examples, I'm not sure 19 (c) is any worse than 19 (b). I don't love it, but I'm also not convinced it warrants a MUST NOT designation. What is the "superposition" interpretation that you suggest?

19 (d) just completely confuses me. I'm not sure how a Process node can be introduced to make that diagram correct. I think that example is doing more harm than good.

jakebeal commented 4 years ago

Per the specification, a node does create an additional interaction, making one edge into two edges:

A glyph at the point where an edge splits or joins represents a biochemical process, i.e., an additional Interaction with type and roles set by the process glyph.

This gives you two roles, per Appendix A.4. To distinguish any more roles, we need to allow interactions with interaction nodes, e.g., 19(a). That shows a system with 2 Interactions:

  1. dCas9 is stimulated by gRNA to produces an unspecified stimulated form of dCas9 (in this case, realized as dCas9-gRNA, though that's not made explicit in the diagram).
  2. stimulated dCas9 inhibits the activity of the promoter.

Superposition is the fact that the dCas9 is bring produced by two different places, and their edges join to indicate that there's just one chemical concentration of interest here.

19(d) can't be fixed by adding a process node - maybe I should just omit it then?

JS3xton commented 4 years ago

Sorry, the language and Fig 19 (c) are still not clear to me, but if others are happy with them then I will defer.

Other comments:

I don't like the Fig 19 (a-c) example in general. It's not an intuitive way to illustrate CRISPRi in my opinion. Moreover, you mention a better example by email: aTc repressing the repression of a TetR CDS on pTet. I think this is a more prototypical example of the type of diagram we are trying to permit with this SEP, and I think it's simpler to specify and understand.

I also don't know why "An edge with multiple heads MUST use the same glyph for each head".

jakebeal commented 3 years ago

I've been working on this, and have come to agree with @JS3xton that we need a deeper understanding of the meaning of what exactly nodes and edges mean. Let us consider the canonical case of an interaction with an interaction suggested by @JS3xton : aTc repressing the repression of a TetR CDS on pTet.

image

I have come up with four potential theories on how to interpret a "higher-level" interaction like this:

  1. The additional edge adds an additional participant to the indicated interaction, with the indicated role. This makes a lot of sense for something like a catalyst, but it doesn't work for inhibition of inhibition, because then both aTc and TetR CDS end up with "Inhibitor" roles, which does not work for our canonical example.
  2. The additional edge is literally a meta-level modulation, e.g., inhibition of the inhibition process. This fits well with the way that we talk about these relationship and with some equations, but is not compatible with SBO, which does not support meta-relations.
  3. The additional edge implies an omitted biochemical process interaction node (the original theory in this SEP). Even leaving aside all the previously discussed issues, that turns out to be incompatible here, because it implies roles of "reactant" and "product" for TetR CDS and pTet, which does not make sense here.
  4. The additional edge implies an omitted species. In this case, the omitted species would be the TetR protein, whose activity is being inhibited by aTc. The TetR protein does, in fact, serve as an inhibitor for pTet. The omitted species then must be the product of a biochemical process that is not being represented, as shown here:

image

I think I like this last theory best, as it seems to resolve a lot of the questions and doesn't require us to come up with a theory of "incoming" and "outgoing" edges from nodes. It also implies answers to questions about whether diagrams like the following are legal:

JS3xton commented 3 years ago

Hmm, I like interpretation 2 (an interaction with an interaction is a meta-level modulation)—it's how I literally think of interactions with interactions, and it doesn't presume anything unnecessarily (I may not want to imply an underlying mechanism, for example). And it doesn't bother me that it's not compatible with SBO.

I agree interpretation 4 appears to work nicely for the prototypical aTc example and others conforming to it, but I'm not sure it always will (no counterexamples come immediately to mind, though).

jakebeal commented 3 years ago

I've spent some time writing up make take on interpretation 4, which I'm finding also helped to clarify how process nodes are supposed to work as well. This explicitly includes the concept from interpretation 2 as well, but defines it in terms of a pattern that deterministically expands to interpretation 4.

I've updated both the draft branch and the SEP accordingly; please take a look and let me know what you think: https://github.com/SynBioDex/SBOL-visual/blob/master/SEPs/SEP_V018.md

JS3xton commented 3 years ago

... will be moved from item 5.4.3 (multi-head/multi-tail edges) to item 5.4.4 (interaction nodes), along with its associated example figures

I think Figures 18(a) and 18(b) should stay with Section 5.4.3. 18(c) and 18(d) should be with Section 5.4.4. 18(e) and and the new 20(b) don't seem related to either Section 5.4.3 or 5.4.4. I don't think 18(f) should be illegal.

The following new items will be added:

Where? Is there a new section on Interactions with Interactions?

(Sorry, I tried to skim the latex files in the draft branch, but I couldn't easily understand them. On that note, rendered PDFs might help those of us who don't want to render the latex locally.)

  • An edge with its tail at an interaction node MAY use an Interaction arrow head to indicate an additional Interaction in which this product of the biochemical process has the tail role associated with that type. An example is provided in 20(b).

-Figure 20(b)-

(b) Example of a composite edge pattern representing two interactions: CRISPR complex formation with dCas9, where that complex then represses a promoter.

This was really confusing for me. I found the first sentence convoluted and not simple or clear.

I also didn't like the term "composite edge pattern" because I was encountering it for the first time without it being defined. If you want to establish that phrase, you might add a section dedicated to it and defining it.

And why is it necessary to define composite edge patterns after Section 5.4.3 has been established? Figure 20(b) feels a lot like an edge with multiple tails. Section 5.4.3 obviously establishes constraints on head and tail roles, but one might argue those constraints are superfluous.

  • The head of a “higher-level” directed edge E1 MAY connect to an intermediate point on another “target” edge E_2, forming an “interaction with an interaction” pattern.

I don't like the colloquial use of the phrase "higher-level". I think I understand what you're getting at, but I think it should be defined more concretely or not used at all.

The “higher-level” edge form is equivalent to an expansion into a pattern of three interactions with an unspecified intermediate molecular species S, as follows:

  • The head of the “higher level” edge E1 is the unspecified species S.
  • The tail of the “target” edge E2 is the unspecified species S.
  • The unspecified species S is the Product (SBO:0000011) of a Process (SBO:0000375) Interaction whose Reactant (SBO:0000010) is the original tail of the “target” edge E2.

I still don't think implying an underlying mechanism is necessary.

Note that this applies only to heads, as an intermediate tail cannot be distinguished from multiple heads, as specified above; likewise, the connection point on the “target” edge MAY be a non-split portion of a multi-head or multi-tail edge, but MUST NOT be a branch of a multi-head or multi-tail arrow.

This sentence was hard to follow.

Also, I think an interaction should be able to connect on the split point of a multi-head or multi-tail arrow (I've done this before).

An edge MUST NOT be the “target” for more than one “higher-level” edge, as the expansion of such a form would be ambiguous.

I think it should be legal to have multiple interactions targeting the same edge. I.e., Figure 21(f) should be legal. This is another reason not to imply and underlying mechanism.

Having a “higher-level” edge also be a “target” edge is NOT RECOMMENDED.

I don't think this should be NOT RECOMMENDED. I.e., Figure 21(g) should not be discouraged (although I wouldn't highlight it with a figure, either).

Figure 21(d) higher-level edges MUST NOT connect to a branch of a multi-head or multi-tail edge

I think this should be legal.

Also, the term "branch" is not clearly defined (for example, when edges with multiple heads or tails are introduced in Section 5.4.3).

Figure 21(e) higher-level edges MUST NOT connect to the head or tail of an edge

I'm not sure this should be illegal. I feel like I've seen diagrams like this to represent cascades.

jakebeal commented 3 years ago

I think Figures 18(a) and 18(b) should stay with Section 5.4.3. 18(c) and 18(d) should be with Section 5.4.4. 18(e) and and the new 20(b) don't seem related to either Section 5.4.3 or 5.4.4.

That's the intention; I've clarified.

Most of the rest of the concerns that you raise, I think, stem from a fundamental question of what an edge means, which I think we should discuss directly, and possibly clarify. We're essentially coming back to the question above of interpretation 2 (meta-level modulation) vs. interpretation 4 (omitted species).

My concern with the meta-level modulation interpretation is that we need to change the underlying semantics of edges. Right now, we ground all of our definitions in the semantics of SBO, mapping an edge to a relationship and its ends to the roles of physical entities. Relationships are not physical entities, so we can't keep both keep SBO semantics and also have meta-level modulation. Above, you said that you weren't worried about remaining compatible with SBO; if we don't do that, however, I'm not sure how we could ground the semantics of our diagrams.

What are your thoughts?

JS3xton commented 3 years ago

Right now, we ground all of our definitions in the semantics of SBO, mapping an edge to a relationship and its ends to the roles of physical entities.

Hmm, perhaps implicitly, but I don't actually see this specified in the SBOL Visual 2.2 specification (I'm looking at Section 5.4, Interaction).

Moreover, I don't see why an edge cannot be both an Inhibition (SBO:0000169) and Inhibited (SBO:0000642) (i.e., the head of an upstream inhibition edge). This doesn't seem to conflict with SBO as far as I can tell (caveat: I'm only cursorily familiar with SBO).

jakebeal commented 3 years ago

The specification is in Section 4, where we define the classes of the glyphs:

Interaction Glyphs are “arrows” indicating functional relationships between sequence features and/or molecular species. They are associated with Systems Biology Ontology terms.

In particular, every edge is defined with a term contained in the "occurring entity representation" (SBO:0000231) branch, while the heads and tails are defined with terms contained in the "participant role" (SBO:0000003) branch. I don't think that was spelled out explicitly, but that's what's used for the "Interaction" and "Participation" classes in the SBOL data model, and what was the intention of that section. Making that explicit would be clearer (and we could make the SBO branch links for the other glyph classes explicit as well), but that was definitely the intent and belief in the construction of SBOL Visual, so if we change that, it needs to go through and SEP.

As I look deeper... I'm not sure if the two items conflict in SBO or not. My instinct is that these top-level branches are intended to be separate, but that isn't necessarily true, since something can clearly both be a physical entity and have a participant role.

If we allow this, we will definitely be making a choice that is incompatible with the current SBOL data model, which does not support higher-level interactions at present. That might, however, instead be indicating a thing that we'd want to change in the data model, and indeed I've recently seen an example where it might make sense. I'm going to open up this question on the sbol-dev and SBGN mailing lists, and see if others agree or disagree with the possibility of directly representing higher-level interactions. If we can have higher-level interactions, then it all becomes simple.

JS3xton commented 3 years ago

The specification is in Section 4

Ahh, thank you for pointing that out; sorry I missed it.

In this case, I would advocate for changing that language, perhaps something like:

Interaction Glyphs are “arrows” indicating functional relationships between sequence features, molecular species, and/or other relationships. They are associated with Systems Biology Ontology terms.



I'm also in favor of modifying the SBOL Data Model accordingly so it can describe interactions with interactions.

jakebeal commented 3 years ago

It looks like we're likely to proceed with a change on the SBOL data model to explicitly support higher-order interactions: https://github.com/SynBioDex/SEPs/issues/104

I'll reformat this proposal around that option shortly, which should simplify things as well.

jamesscottbrown commented 3 years ago

I agree with representing this as a higher-order interaction.

I dislike suggestion 4, because I think that activation/inhibition should always refer to a process; I don't think it's meaningful to refer to a chemical entity as being inhibited (particularly if it has multiple roles, only one of which is affected).

If the edge implies an omitted species, then it should also imply another process that produces/degrades/converts this omitted species. In the TetR/aTc example, there's an omitted species (TetT protein), but also an omitted process (the reaction between TetR and aTc to form a complex that doesn't act as a repressor). But from the diagram alone it could just as well be the case that aTc prevents a reaction that converts an inactive precursor of TetR to a active form.

I think it's fine to omit these reactions and species from a diagram. I think it's fine to interpret the higher-level-arrow diagram as meaning that aTc somehow affects the repression of pTet by TetR. However, it seems strange to interpret the higher-level arrow as implying the existence of an omitted species, but not also additional processes affecting this species. Since it is ambiguous what this omitted species is, it seems neater to just interpret the diagram as depicting a higher-order interaction.

Having said that, I also think that it's reasonable to interpret any interaction from a CDS [except for a Process edge pointing at a protein] as implying the existence of an omitted product of the CDS, whether or not this interaction is affected by any higher-level arrows. It's common to draw an interaction from a CDS -> Promoter, but this is really a shorthand for CDS -> Protein -> Promoter. This should perhaps be made explicit in the spec.

To summarise, I think the arrow from the CDS to pTet is best interpreted as implying the existence of an omitted species, but the arrow from aTc to this is best intepreted as a higher-order interaction.

jakebeal commented 3 years ago

OK, folks. Hopefully the third time will be the charm. I have yet again revised the proposal, this time to be explicitly using higher-order interactions. I've also separated out one of the questions raised above as its own independent issue (Should multi-head arrows be allows to have different heads?).

Please review the current state and comment: https://github.com/SynBioDex/SBOL-visual/blob/master/SEPs/SEP_V018.md

jakebeal commented 3 years ago

@JS3xton @jamesscottbrown @chofski How do you feel about the current revision? If we're good here, I'd like to move this forward to a vote, along with all the other ready-to-go SEPs.

chofski commented 3 years ago

I’m happy with this.

JS3xton commented 3 years ago

Looks great to me, thanks @jakebeal!

jakebeal commented 3 years ago

OK, with two thumbs up, I'm going to consider this ready for a vote until objections are raised.

jakebeal commented 3 years ago

Closing as accepted and incorporated.