SynBioDex / SBOL-visual

The reference implementation of the SBOL Visual standard
Other
31 stars 16 forks source link

SEP V007: Stem-Top Glyphs #12

Closed jakebeal closed 5 years ago

jakebeal commented 6 years ago

This SEP proposes a systematic set of "stem-top" glyphs representing small sites affecting DNA, RNA, or protein. The glyphs in this system are Biopolymer Location, Stability Element, and Cleavage Site.

Please see the full proposal at: https://github.com/SynBioDex/SBOLv-realizations/blob/develop/SEPs/SEP_V007.md

jakebeal commented 6 years ago

My personal preferences are:

mholowko commented 6 years ago

I think that sawtooth and wavy are too similar to each other, they will be too easy to mix up. I would prefer the one, two, three lines but then there is no obvious analogy between them and the actual elements they represent. Unless we agree that they represent the classic central dogma "progression".

jakebeal commented 6 years ago

@mholowko That was the logic behind how Mathew Pocock first proposed the one, two, three lines variant.

Do you think there's a way that something like sawtooth could be made more distinctive from wavy? I'd love to keep things more intuitive if we can.

mholowko commented 6 years ago

@jakebeal I agree that the wavy/sawtooth is more elegant than the one, two, three line. As for making them more differentiated - maybe we could make the teeth angles more/less sharp? But not sharp enough to make it resemble the straight line again? Because now both of them have similarly "angled" curves.

jakebeal commented 6 years ago

I've put in a sharper version of the sawtooth --- what do you think about this?

glyph specification

mholowko commented 6 years ago

Yes, I think this will be much better. Thanks for working this out, my vote is on this variant now.

On 18 Sep 2017 10:25 pm, "Jacob Beal" notifications@github.com wrote:

I've put in a sharper version of the sawtooth --- what do you think about this?

[image: glyph specification] https://raw.githubusercontent.com/SynBioDex/SBOLv-realizations/cd93a0b/Glyphs/cut/stem-top-specification-sawtooth-sharper.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/12#issuecomment-330203891, or mute the thread https://github.com/notifications/unsubscribe-auth/AZqsaBnjffIC0ihRg6kEo936-QFuIOvGks5sjmFNgaJpZM4PaPiz .

JS3xton commented 6 years ago

My Initial Preferences

Rationale

Questions / Comments

Can we combine/superimpose Tops?

Also appropriate for this SEP: can we combine/superimpose Tops? Specifically, I have in mind describing self-cleaving ribozyme insulators (like RiboJ, originally described by Lou et al, Nature Biotech, 2012).

The canonical structure for self-cleaving ribozyme insulators includes a cleavage site and some hairpins. (from the Supplementary Materials of [Nielsen et al, *Science*, 2016](https://www.ncbi.nlm.nih.gov/pubmed/27034378))


Annotated sequences of self-cleaving ribozyme insulators. (from the Supplementary Materials of [Nielsen et al, *Science*, 2016](https://www.ncbi.nlm.nih.gov/pubmed/27034378))


This raises the question: what is the most appropriate way to describe the DNA features relavent to self-cleaving ribozyme insulators using SBOL Visual glyphs?

One of the first attempts to describe them in the literature that I am aware of occurs in Nielsen, Segall-Shapiro, & Voigt, Curr Opin Chem Biol, 2013.

Figure 3, (a)

Nielsen et al, Science, 2016 went on to use the "RNA Stability Element" again.

Figure 3, B


I've been unable to find much discussion on the effects on RNA stability imposed by RiboJ, but given the established use of the RNA Stability Element glyph in literature, I've been hesitant to discard that glyph entirely. I still feel that Cleavage is the main purpose of that DNA feature, though (for my designs at least). As such, I've compromised and superimposed the Ribonuclease Site glyph and the RNA Stability Element glyph on top of each other:

self_cleaing_ribozyme_insulator

I now arrive at my question: is such a glyph sanctioned by SBOL Visual? (regardless of whether it ultimately ends up being the most appropriate glyph to describe a self-cleaving ribozyme insulator).

I would also love to see more modern alternative representations proposed for self-cleaving ribozyme insulators. Perhaps:

self_cleaing_ribozyme_insulator_2

jakebeal commented 6 years ago

@JS3xton

Lots of good thoughts in here; let me try to organize my response:

JS3xton commented 6 years ago
  • On "protein = straight 2 lines": my key concern is that two lines typically implies DNA.
  • I do hear that you find wavy too close to sawtooth. Do you think another distinct stem like "looped" (referencing an alpha-helix) or "square wave" (just being really different) would work?

Hmmm, I've been thinking on this a little bit and kinda liked this:

DNA Feature Relevant to DNA DNA Feature Relevant to RNA DNA Feature Relevant to Protein
My Stem Preferences

where the protein glyph has the adjacent small circles as a stem, alluding to a polypeptide sequence (this admittedly starts to look like the old dashed lines, though).

  • On Cut: this actually just means any zero-length location, not anything necessarily associated with a modification. I called it "Cut" because that's the SBOL name for a zero-length location, but maybe it would be better to use one of the SO synonyms like "Junction". A good example of how this is being used with the "map pin" icon right now is in Desktop Genetics' software tools, which use it to describe knock-in and knock-out operations. See: https://www.deskgen.com/landing/documentation.html If this part of the proposal is accepted, those would be valid SBOLv.

Ahhh, OK. Yeah, I definitely think that going with "Junction" is better than "Cut".

I hear you on use of the "map pin", thanks for the examples. I'm still not in love with Junction absconding with the circle Top, but I could perhaps be persuaded if the Stability Top was improved upon. Some thoughts there:

  • On "Restriction Enzyme Recognition Site" - your point is well made, but the current structure of Sequence Ontology uses that term to refer to the union of blunt and sticky cleavage sites. Maybe SO can be persuaded to adjust their name?

Hmmm. OK, upon closer inspection, I think the terms exist in the Sequence Ontology to differentiate between the Recognition Site and the Cleavage Site, but the SBOL Visual Glyph Specification for "Restriction Enzyme Recognition Site" (from Supplementary Table 2 of the PLOS Biology SBOL Visual publication) appears to lump both SO terms together.

A Type IIS endonuclease recognition/cleavage site diagram (BsmBI) for reference:

It appears that the "Cutting site" is best described by SO:0001692 ("sticky_end_restriction_enzyme_cleavage_site"), but also well described by SO:0001687 ("restriction_enzyme_recognition_site"), and the "Recognition site" is well described by SO:0000061 ("restriction_enzyme_binding_site"). (The definition of SO:0001687 ("restriction_enzyme_recognition_site") even states: "this may or may not be equal to the restriction enzyme binding site")

In my opinion, the "Restriction Enzyme Recognition Site" glyph resembles a cut site (i.e. SO:0001687), so I might propose removing the association with SO:0000061 from the "Restriction Enzyme Recognition Site" glyph and re-introducing SO:0000061 with a separate glyph if so desired. (As mentioned above, "Restriction Enzyme Recognition Site", SO:0001687 seems to overlap considerably with the proposed (DNA Stem - Cleavage Top) glyph)

jakebeal commented 6 years ago

I agree with you on SO:0000061, and have updated accordingly. Using SO:0001687, however, is specifically selected over using SO:0001692, as we already have a separate glyph for that, and SO:0001687 also covers blunt-end restriction sites.

jakebeal commented 6 years ago

@JS3xton I like your shield option and am adding it; I tried to make a rounded version too, but couldn't get it looking right.

For stems, your "chain of circles" made me think more carefully about the electrical engineering symbol for an inductor, which is actually a looping line. What do you think of this as an option:

glyph specification

JS3xton commented 6 years ago

@JS3xton I like your shield option and am adding it; I tried to make a rounded version too, but couldn't get it looking right.

👍

I also thought about rounding the bottom edges on the shield, but didn't have time to try it. 😆 To some extent, keeping the major control nodes of the glyphs on a grid is helpful anyways, so maybe it's better if the shield is not rounded.

For stems, your "chain of circles" made me think more carefully about the electrical engineering symbol for an inductor, which is actually a looping line. What do you think of this as an option

Hmmmm. I'd say it's better than some options, but I'm only lukewarm on it. Thoughts:

jakebeal commented 6 years ago

I was thinking "alpha helix" for the protein analogy?

JS3xton commented 6 years ago

Ahh OK. That hadn't occurred to me; that makes a little more sense. Seems a little contrived, though, as it's unnecessarily restrictive to (i.e. not evocative of) a large subset of proteins (anything that's not an alpha helix). 🤔

I'd say I'm still lukewarm on it (perhaps a little warmer with the "alpha helix" realization?), but I'm more interested in hearing what others think; I think I could easily be persuaded to support the inductor looping line stem representing DNA Features Relevant to Proteins if there was general consensus in favor of it.

jakebeal commented 6 years ago

I have updated the proposal with my current best understanding of the state of discussion. Here is my current understanding:

We have apparent consensus on:

I would propose the following to resolve the remainder:

Would others agree with this proposal?

chofski commented 6 years ago

Some great discussion here. I agree with the last suggestions from Jake, but agree with John that a more generic symbol should be used for the ribozyme insulators that can be drilled down into the specific components making it up. Although there is not much in the literature about improved stability of transcripts cleaved by RiboJ and variants, in my hands we see that nearly always is the case.

jakebeal commented 6 years ago

@chofski I agree that the idea of a "functional RNA" sub-language would be an excellent idea, and would be interested to see proposals for it.

With regards to the present SEP, I believe there is no conflict: when created, the functional RNA sub-language would override the current proposal where needed.

JS3xton commented 6 years ago

I was talking with a labmate, and it occurred to me that the Shield Top may resemble a commonly used slider glyph used for scrolling through DNA sequences (I'll point to the DeskGen link again: https://www.deskgen.com/landing/documentation.html).

image

Something to keep in mind.

jakebeal commented 6 years ago

Good point. If we followed this current proposal, we would likely suggest that DeskGen change that to a straight vertical bar, indicating junction. Of course, if you have another suggestion instead of shield, that could be considered as well. I'm also going to flag this for DeskGen's attention to see if they are willing to comment.

JS3xton commented 6 years ago

@jakebeal

  • Junction top: Circle, with no-top as an alternative This replaces prior glyphs for restriction site and Protein Stability element, but is backward compatible.

Sorry, I'm a little confused here. So the Restriction Enzyme Recognition Site glyph will be retired? (I don't understand precisely what you mean by "This replaces prior glyphs for restriction site...") This is the first I've recognized that, but perhaps I missed discussion on that elsewhere. Regardless, I think the Restriction Enzyme Recognition Site glyph and the "no-top" Junction alternative should not co-exist (the only distinction would be the positioning of the glyph relative to the backbone).

And to expound on this, to describe a restriction site (if it was not appropriate to use the more specific 5' Sticky Restriction Site glyph, 3' Sticky Restriction Site glyph, or Blunt Restriction Site glyph), you would no longer use the Restriction Enzyme Recognition Site glyph and should instead use the DNA-Stem cleavage-Top glyph? And to describe a Protein Stability element, you should use the protein-Stem (i.e. looping line) stability-Top (i.e. shield) glyph? If that understanding is correct, then I support these changes.

These changes do not sound backwards compatible to me, though, which may call into question what release these changes should be associated with. We've already broached this topic a little bit over in SEP V006, and I've expressed my opinions on it here, so I won't reiterate.

  • X-ase top: binding pocket cap, with X as deprecated alternative

I don't support this at present; the X still seems much more intuitive for cleavage behavior to me. Cleavage and binding seem like orthogonal behaviors, frankly; I could see the X Top and the binding pocket cap Top co-existing (you could even superimpose them if you really wanted to communicate that both binding and cleavage were occurring at the same site, but I imagine I would primarily only use the X Top to communicate cleavage even if binding was also occurring).

Use of the binding pocket cap Top in conjunction with use of the binding pocket as a replacement for the Operator glyph (discussed in SEP V005) may also be confusing. I think existence of the binding pocket cap in both of those contexts should be explored carefully before being endorsed.

jakebeal commented 6 years ago

@JS3xton I think you are making a good point about the binding pocket cap. I do believe the original idea was "bind and cleave," but I take the point about confusion, and there's nothing really wrong with the X. Since nobody but me spoke significantly in its favor, I am moving this to the "non-supported alternatives."

jakebeal commented 6 years ago

@JS3xton Your question about "Restriction Enzyme Recognition Site" made me go think more carefully about the whole question of sites.

The first thing that I realized is that the glyph should also be linked to SO:0001688 (Restriction Enzyme Cleavage Junction), which is actually a child of Junction. This then covers the frequent usage of the "straight line" glyph on restriction site maps, which is to mark the actual cleavage junction. We still need to keep SO:0001687 (Restriction Enzyme Recognition Site) tied to it as well since that's also a typical usage.

I do believe that we need the "no-top" Junction alternative, because another typical use of "straight line" is to mark insertions, deletions, etc. So that means, as you say, that we do need to remove "straight line" from Restriction Enzyme Recognition Site.

Adding SO:0001688 as an interpretation, however, means that it is still legitimate (just not recommended) to use the Junction glyph to represent a Restriction Enzyme Cleavage Junction.

Since stability elements have a non-zero length, however, you are right that lollipop is not backward compatible. That means we have two alternatives here:

I think that the 2.0 would likely be better, but am receptive to either idea.

rodoyle commented 6 years ago

Hello All,

I'm still catching up on the full discussion thread here so please bear with me if certain items have already been covered.

DESKGEN has supported a variant/mutant flavor of SBOL visual since 2013 with our own enhancements for CRISPR-specific work. We also learned we had to compromise on the standard at times to make things more intuitive for users with less exposure to Synthetic Biology.

We obviously make heavy use of stem-top glyphs to represent RNA-guided Nuclease Binding Sites (aka Guide RNAs) in a compact way. Specifically the vertical bar, the stem, is used to mark the location of the double stranded break when the break is "blunt". We actually stagger the stem to mark Cpf1 and other nuclease cut sites in the genome that leave overhangs.

The circle is primarily used to compactly store a score for the guide. In practice however, we learned it really serves as a local "ID" for the guide in the context of a larger diagram . For example: 'guide 55 is closer than guide 49'.

The play head (shield) was chosen to explicitly suggest interactivity to users. The shape isn't important; what matters is that it looks like it can be clicked and dragged.

So this gives a few criteria/requirements I would recommend for the symbols:

  1. The symbol's stem must be able to precisely and unambiguously point to a location in a biopolymer, such as a genomic coordinate. It should not be used for things that are better represented as ranges or intervals.

  2. The symbol's top should identify the general type and ID of the thing it represents.

  3. Practical implementations need to render lots of these glyphs quickly (usually as SVG DOM elements in the case of interactive web apps) so the glyphs should be as simple as possible.

  4. It is important to distinguish binding sites (which are coordinate intervals, typically some sort of rectangle in our tools) from staggered or blunt cutting sites which have discrete coordinates (the specific bonds being cut).

  5. Practical implementations must distinguish between "for display only" glyphs and interactive elements of a user interface. Enough of the "visual language space" needs to be left over to say "this is something I can click on" vs. "this is just pointing to something".

I feel that criteria 1 and 3 are the most relevant to the current proposals.

SVG stroke (line) elements natively support solid, thickness, dotted, dashed properties. Waves, Squiggles, etc. to the best of my knowledge have to be (pre)computed as bezier curves or lots of little lines, which is an order of magnitude more annoying to implement. Of the two, ZigZag waves are easier to do. In SVG at least dotted and dashed are explicitly distinguished but in probably any other use case they are not.

Long story short, on the basis of implementation, I would go with the stem styles of: line, squiggle, zigzag such that the "frequency" of the undulations should increase DNA --> RNA --> Protein. I would forgo the use of curves. I would also be cool with solid --> dashed --> dotted in the same way. The ordering to me seems like a natural way to remember DNA, RNA, Protein.

As an aside in our new Genome Editor tool we actually moved away from the "Stem-Tops" and represent the Guide RNA cut sites with two triangles and a line between their points. Sort of like: > ------ < (but vertical). This has been ... contentious because there is no "ID" or tag associated to the glyph. It makes it much harder to discuss the diagram with colleagues (probably the most important use case)!

jakebeal commented 6 years ago

@rodoyle One of the key things I'm taking away from your comments is there is currently a fundamental confusion between Junction and Region usages, which needs to be straightened out.

I believe that we have inherited this from the typical usage in the literature: most plasmid annotations, for example, show overhang cuts as though they were a junction (example: EcoRI in https://www.addgene.org/73851/).

The distinction is also ambiguous in SequenceOntology, which has both Junction and Region senses for DNA cleavage, but only regions for RNA and protein.

I think we need to think hard about how to deal with this issue.

jakebeal commented 6 years ago

I've looked at this and thought hard, and I think that the Junction/Region dilemma is not as bad as it looks. Some of our other Region elements also use single lines touching the backbone, notably the well-accepted Promoter and Terminator glyphs. The main problem is the lack of all of the needed SO terms, which we can remedy with a request for their addition.

I thus recommend that we should move forward as follows:

graik commented 6 years ago

The (straight) stem+circle symbol is the most common (close to a de-facto standard) way of depicting phosphorylation sites in proteins. Sometimes it is also used for other post-translational modifications. Re-defining it as a "stability element" symbol would cause major headaches in that area.

jakebeal commented 6 years ago

@graik I believe that we are actually improving on that problem.

SBOL Visual 1.0 currently has stem+circle as stability element --- thus, the problem you are concerned about exists in the current standard. Under this proposal, we change stem+circle to just be marking a junction site of any type, which is much closer in meaning. Stability instead becomes the "shield" glyph.

graik commented 6 years ago

Sorry, I mis-read the SEP then. Not quite so bad then. The remaining concern is that post-translational modifications do not fall into the category "junction" (aka between residues). So then you still utilize the common "modified residue" symbol for something that, by the SEP definition, is very different (something interesting going on between these two residues).

jakebeal commented 6 years ago

I believe this issue relates to questions on the use of SequenceOntology: I would support generalizing the "place marker" glyph to mean not just Junction but also Amino Acid (SO:0001237) and Base (SO:0001236). This would then cover these meanings as well.

graik commented 6 years ago

That would be excellent, yes.

jakebeal commented 6 years ago

@graik OK, I have made that extension: following the comments from @rodoyle the "Junction" has become "Biopolymer Location", and can indicate either a base/aa or a junction between them.

jakebeal commented 5 years ago

Accepted and integrated, and thus closed per SBOL procedure in updated SEP 001.