SynBioDex / SBOL-visual

The reference implementation of the SBOL Visual standard
Other
32 stars 16 forks source link

Glyphs for type IIs / asymmetric endonuclease sites #38

Closed shyambhakta closed 3 years ago

shyambhakta commented 6 years ago

I'd like to propose a glyph for type IIs restriction endonuclease sites, a feature used in diagrams of Golden Gate assembly and especially in MoClo-like hierarchical DNA assembly part/cassette/multigene plasmids, which are increasing in popularity (maybe ~30 toolkits out there). [Yeast Toolkit, original MoClo]

Type IIs sites differ from regular sticky end restriction sites in that their sites are asymmetric and cut outside their recognition sequence, allowing simultaneous excision and assembly of up to 20 fragments / plasmid elements. The main feature that needs to be captured in the glyph is the asymmetry. The SBOL 5′ cohesive end restriction site symbols are non-directional, but have been used in a few Golden Gate toolkit papers already despite this, though they could have benefitted from a directional glyph to specify restriction site direction, which informs how connectivity in Golden Gate reactions is to be expected. The "traditional" type IIs site symbol (2nd panel) might be confusable with a promoter, and I thought the new glyph ought to build off the existing 5′ sticky end glyph (3′ type IIs sites have not to my knowledge ever been employed for Golden Gate, so the 3′ versions can be omitted.). image

I polled my previous lab which exclusively uses hierarchical Golden Gate based on the Yeast Toolkit, and my current lab which heavily uses ad hoc Golden Gate (except me – I'm all hierarchical). I asked them to choose a glyph most obviously indicative of proper direction and restriction site position relative to the cut site and non-confusable with existing symbols. We came to the consensus of choosing (B). image

The rationale for (B) is that we refer to pairs of Golden Gate sites as inward- or outward-facing, which seem to be most plainly captured by horizontal arrows, "originating" in the recognition site. The others, vertical arrow glyphs, while still giving a sense for direction when the path of the symbol is traced, places unnecessary, potentially distracting emphasis on a vertical motion to cleavage through a strand. (C) and (D) may be perceived as nicking sites. (A) would seem to thus be second in preference, but still doesn't emphasize the location of the recognition site as much as (B) does, nor the "inward/outward" paradigm.

jakebeal commented 6 years ago

I'd like to try to understand a little better the use case you're aiming to address with this symbol.

If one wants to explicitly represent the separation between binding site and cut site, one might already do so with the binding site + cut site icons: image A shared label or an simulation arrow could also be added to make the relationship more explicit.

At a higher level of abstraction, the existing cut site symbol can be used to represent the whole complex.

Would these cover the use cases you are looking to address, or are there others you do not yet see?

shyambhakta commented 6 years ago

Now that I think about it more, I realize that the need could be extended to not just type IIs sites, but any asymmetric nuclease site.

The explicit separation between binding site and cut site is not the need; the need is for a distinction of direction/asymmetry of the site, as the current symbols have 180° rotational symmetry. And while a pair of binding site + cut site glyphs technically works for type IIs enzyme sites, it's neither succinct enough (the site is imagined as and is biologically a single unit) nor does the pair of glyphs cover the use case of other type of asymmetric sticky-end restriction site and also meganuclease sites, TALEN/ZFN targets, Cpf1-gRNA cleavage sites, and nCas9-gRNA targets, which may have the cut site somewhere inside the asymmetric binding region or not, depending on the enzyme/system. Overlapping the two symbols asymmetrically looks confusing and ugly and the positioning of this overlapping would need to be standardized for it to work well, but it wouldn't then be generalizable enough due to the variety of enzyme cleavage architectures.

The frontward arrow in (B, top) is sufficient in denoting orientation of any† asymmetric cleavage site. (B, bottom), of course, is just an inversion of the top for use when the site is inverted relative to the "standard" direction, which is perhaps standardizable for restriction enzymes as the direction listed in the REBASE database. An arrow version of the 3′ sticky end restriction site glyph can be made, too, to cover that type of type IIs enzymes, though I don't know of their common use.

I personally only have use for an asymmetric 5′ sticky end glyph, but I can imagine the same argument for an asymmetric blunt nuclease site, the main one I can think of is a standard Sp.Cas9 cleavage site… perhaps an arrow of sorts can be added in the blunt end brackets symbol?

shyambhakta commented 6 years ago

An aside — I saw in #8 that a nondirectional sticky end glyph was proposed, and it confused me, because the current glyphs inherently don't specify direction, since they have 180° rotational symmetry. The given prototypical example of an EcoRI site (as with most traditional, type IIp restriction enzymes) is by its nature nondirectional (it's palindromic). The only property remaining to consider is whether it's a 5′ or 3′ sticky end or blunt restriction site, another property that doesn't change, nor (with the current glyphs) describes directionality. I can, however, see this symbol being used as a nondescript nuclease site (blunt, or 5′/3′ sticky end), though I'm not sure when exactly the stem-top glyph is to be used (maybe when intended in vivo function versus in vitro?) I saw this stem-top glyph vs restriction site glyph issue raised in #12, but I got lost in what the conclusion was.

jakebeal commented 6 years ago

Let me try to paraphrase, to see if I am understanding the concern correctly.

If I am indeed understanding correctly, I'd like to propose a slightly different glyph. I like the arrow idea that you are proposing, but it only captures 2 of the 4 possibilities and doesn't work very well for blunt sites or representing cut sites with Location or Cleavage Site glyphs. What if instead we have a "carat" (truncated arrow) to act as a modifier, to specify symmetry or asymmetry?

Thus, for example, we could have a 5' sticky restriction site, on the plus strand with the binding site offset in the 5' direction, the 3' direction, or symmetric:

five-prime-sticky-restriction-site-asymmetric-five-prime-asymmetric-five-prime five-prime-sticky-restriction-site-asymmetric-five-prime-asymmetric-three-primefive-prime-sticky-restriction-site-asymmetric-five-prime-symmetric

If this system is acceptable, it can be generalized to all of the glyphs that involve cleavage sites.

shyambhakta commented 6 years ago

The carat is quite too small for the main use — it needs to be really apparent which direction Golden Gate sites are facing. I see what you mean in that it would be nice to be able to use the same modifier for the blunt-end glyph, but the stemmed location/cleavage site glyphs are nowhere rotationally symmetric, so they don't have the same problem, right? For example, CRISPR targets have a "standard" direction of protospacer-PAM, so the stem would go up to label that site, and it'd go down to label it in the reverse. But if it were labeled with a blunt-end cleavage site glyph, you can't tell whether it's forward or reverse. If direction is that important, then it needs a bigger, noticeable arrow. Maybe it can be overlapping with the sticky end glyphs like I drew, but offset for blunt, like (C.a, C.d)? Not sure what to think about the overlapping versions (C.b, C.c). Centered overlap (C.c) requires a larger carat than (A, B) to work, but (A, B) can have the same size carat as (C.c); it wouldn't look bad. I was just consistent with my previous drawing for (A, B).

If your carat glyph (B) means that the recognition site is 3′, then it ought to be the inverted (A) glyph, right? Because in the type IIS case (the one where the binding site is offset; others are asymmetric while overlapping, so there's no "offset"), the standard/forward direction is binding site -> cut site. So I don't think there needs to be a cut site -> binding site glyph. As for a symmetric modifier, your (C) doesn't preserve rotational symmetry. And I don't feel it evokes symmetry; it actually seems asymmetric, resembling my previous (C) and (D) options in my poll. Perhaps an overlapping/touching circle, like (D,E,F) below?

asymmetric cleavage site glyph prototypes 2

jakebeal commented 6 years ago

Size of the carat can be easily adjusted (and is a scaling detail that's explicitly not controlled). The particular choice that I made was intended to keep it inside of the typical width we've been using for glyphs.

With respect to your new alternatives: part of the reason I was suggesting something like the carat is that I think the centered markers that you've got will not interact nicely with the nucleic acid backbone, especially for double backbones. I was thus trying to separate it up and move it away from the backbone. My idea was to indicate the binding site by which side of the backbone it's on and the direction by which side of the glyph it's on.

Are you certain there is never a case of cut site -> binding site?

shyambhakta commented 6 years ago

Yes, it's certain, because convention holds that the standard direction is binding site -> cut site for enzymes with an offset cut site. http://rebase.neb.com/cgi-bin/asymmlist In any case, I'm not sure it matters, as the glyph needs to merely show direction, after which a standard direction would be implied as part of the symbolic abstraction, just as it is with most other glyphs. Like a top-strand RBS semicircle implies translation initiating rightward, there would be an implication of whatever the relative positions of the binding/cut site are. It just happens to be that convention holds forward to have binding site first or more on the 5' end than the cut site.

I thought these glyphs don't touch the backbone, seeing the files with specifications included. In the sticky end glyph case, there is empty space between the glyph-bounding box and the glyph . Otherwise, a single backbone would be continuous with the horizontal line of the sticky end glyph, and the sticky end glyph would just look like pegs sticking out of the backbone. If the sticky end glyphs are not in fact supposed to be continuous with the backbone, then there's an error in the specification of the blunt end restriction site, which shows the backbone passing right over the glyph and through the space in between the brackets. I would imagine it was intended to be like below.

asymmetric cleavage site glyph prototypes 3

shyambhakta commented 6 years ago

This raises an issue I've had with SBOL specs. No glyph should be required to interrupt the backbone. It's tedious to have to keep breaking the backbone to insert glyphs, especially when hand-drawing. You only knows the exact location of, say, a CDS or origin glyph when you're about to draw it, so you have to erase the backbone you drew to make space, or you have to superimpose the backbone (sloppy), or draw the backbone line last (robbing you of alignment reference when drawing the other glyphs).

Perhaps we can discuss allowing CDS and origin and restriction site glyphs (perhaps there are others) to sit atop the backbone line. It could also be useful in separating/distinguishing top vs bottom strand features and avoiding overlapping glyphs. backbone interruption

Maybe this should be a separate issue?

jakebeal commented 6 years ago

Yes, it's certain, because convention holds that the standard direction is binding site -> cut site for enzymes with an offset cut site. http://rebase.neb.com/cgi-bin/asymmlist

If I am understanding correctly, this would mean there is no notion of "positive strand recognition" vs. "negative strand recognition" for a restriction site? If so, I'm having a hard time wrapping my head around this, since CRISPR-based restriction definitely identifies a specific strand that it binds to, and different variants cut at different locations: Cpf1 apparently cuts ~20 bp downstream of the PAM site, while Cas9 cuts 3 bp upstream of it.

I thought these glyphs don't touch the backbone, seeing the files with specifications included. [snip]

A dual backbone runs above and below the sticky-end glyph --- in theory the single backbone continues through it as well, and the breaks are just to distinguish where the glyph starts and stops. For the blunt end glyph, it is also supposed to run through: these are places where breaks can happen, not where they have already happened.

Finally, with respect to the backbone positioning: I actually agree with your position, but there has been a strong convention to place them on the middle, and the consensus of the community was to follow that convention. Note, however, that vertical positioning is a "SHOULD" rather than "MUST" convention, and you are allowed to violate it if you have good reason.

jakebeal commented 5 years ago

@shyambhakta I'd like to either move this issue forward or else let it be closed for now as having insufficient motivation. Would you be willing to write up an SEP with a formal proposal resulting from the discussion in this thread?

shyambhakta commented 5 years ago

Regarding your earlier question, I'd say yes, there should be no notion of positive vs negative strand recognition, because feature direction is exclusively defined relative to the positive strand: Though Cpf1 and Cas9 have their PAMs on opposite sides of the gRNA recognition sequences, both their recognition sequences are still before the cleavage site in the standard orientation that recognition sequences for Cfp1 and Cas9 are defined, exactly as is standard for type IIs/asymmetric restriction sites to be considered forward-oriented when the recognition sequence is before the cleavage site.

While the existing convention of recognition sites being the sequence before cut sites is nice (cut site is always 3′ to the recognition site in the forward direction of an asymmetric cut site glyph), this can remain a recommendation; any existing strong convention for forward directionality of a recognition site can be kept, despite which side(s) the cut site(s) may be relative to it. For example, if Cas9 were modified to cut 5′ to what is the standard forward-direction recognition site, instead of near the 3′ end, the original orientation should be permissible. Hence the primary objective to make the sticky/blunt-end site glyphs non-rotationally-symmetric.

Sure, I'll try writing up an SEP.

jakebeal commented 4 years ago

@shyambhakta Bumping this to remind that we either need an SEP soon or else I'm going to move this to 2.3

jakebeal commented 3 years ago

@shyambhakta As this has been waiting for an SEP more than a year, and I still don't think I understand your proposed plan well enough to write the SEP myself, I am going to close this issue for now. If you want to move forward with a proposal, please reopen it.