SEP V020: Recommend use of polypeptide region for 2A Sequences

jakebeal commented 4 years ago

This SEP proposes to add the new SO term for 2A sequences to the Protein Cleaveage Site glyph.

Read the full SEP at: https://github.com/SynBioDex/SBOL-visual/blob/master/SEPs/SEP_V020.md

graik commented 4 years ago

2A is mechanistically completely unrelated to protease cleavage. The SO term of "self-cleaving peptide" is in fact also wrong (and should IMO be corrected). 2A sites work at the level of mRNA sequence/structure and make the ribosome "skip", that is the ribosome releases the peptide chain it is busy translating but does not detach. Instead, it resumes translation a couple of codons further down, at a defined position after the 2A sequence.

I would advise not to connect this term to protease cleavage. There is no protease involved, it is not a post-translational processing step, and the two protein halves cannot exist fused in order to be cleaved later. Instead there is 2A-specific considerations such as: (1) it only works in eukaryotes (I think with some host-specificity as well), (2) different 2A sites have different efficiencies in different contexts (some close to 100% others not so).

It would be good to have a symbol for 2A sites but it should be different from protease cleavage.

jakebeal commented 4 years ago

I would not say we are connecting the term to protease cleavage: rather, this is recognizing that "protease cleavage site" was always too specific a term for the general notion of a cleavage site that is captured by this glyph.

With regards to naming: the world does tend to call it "self-cleaving" even if that's not the mechanism, and indeed the mechanism appears to not be completely settled. The SO term does indeed have an alternate name of just "2A polypeptide region" if you prefer, and leaves the mechanism undefined.

graik commented 4 years ago

Well, I am not too concerned about the SO term. But from a protein biochemist perspective: protease site != 2A sequence

This is not some nit-picking detail. It's comparing apples and oranges.

On Tue, Oct 22, 2019, 21:23 Jacob Beal notifications@github.com wrote:

I would not say we are connecting the term to protease cleavage: rather, this is recognizing that "protease cleavage site" was always too specific a term for the general notion of a cleavage site that is captured by this glyph.

With regards to naming: the world does tend to call it "self-cleaving" https://en.wikipedia.org/wiki/2A_self-cleaving_peptides even if that's not the mechanism, and indeed the mechanism appears to not be completely settled. The SO term does indeed have an alternate name of just "2A polypeptide region" if you prefer, and leaves the mechanism undefined.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/78?email_source=notifications&email_token=AAOGZXPVFQYEP2CONVJUKKLQP5AJDA5CNFSM4JDRIQB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEB6W6XA#issuecomment-545091420, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOGZXNNZPICARGLKZWHTE3QP5AJDANCNFSM4JDRIQBQ .

jakebeal commented 4 years ago

Absolutely agreed. However, in prior discussion (#70), the consensus appeared to be that the glyph was for fruit, and that both apples and oranges were fruit.

graik commented 4 years ago

Reading up on that discussion, it was a rather strained consensus... the discussion started out with the same observation that 2A and protease cleavage are not the same. They may end up in a similar result but only under the condition that (i) the protease site is used in the presence of a matching protease and (ii) that the 2A site is used in a matching host.

But as an engineering feature, they are quite different:

2A sites are used to emulate poly-cistronic expression in eukaryotes. This is not a typical usage scenario for protease sites.
protease sites are routinely used for cleaving off protein purification or other tags (after purification). 2A sites cannot be used for anything like it.
protease sites can become hidden / inaccessible by protein 3D structure, 2A sites are not affected by protein structure
protease sites are a protein feature. 2A sites are a (transcription/)translation feature

Given that both features are widely used, but each in different contexts and for different purposes, they both deserve their own symbol.

I for sure would want to know wether there is a 2A site in a design (implying separate proteins from the start, implying this design will only work in a eukaryotic context) and wouldn't want to see this confused with a protease site.

jakebeal commented 4 years ago

I would happily support separate development of a pair of more specific glyphs. That option for distinguishing would be very analogous to how the DNA Cleavage Site glyph relates to the glyphs for blunt restriction sites and sticky restriction sites.

Per the request from the editors for making SEPs atomic, however, I would suggest that those be developed in a separate thread.

chofski commented 4 years ago

I agree with Raik in this instance and think that conceptually the two elements are quite different in their function and so using the same glyph could be confusing. Would it not be better to draw such elements with two separate CDSs (which they are as the 2A sequencing is not protein coding), and use either an 'Engineered Region' or new glyph between these to denote that this region is not protein coding? That seems clearer and more accurate to me.

jakebeal commented 4 years ago

I definitely agree on the two different CDSs in any case, but if we're going to have a new 2A glyph, would somebody like to propose one?

chofski commented 4 years ago

For a paper where we considered translational frameshifting (which is different, as the whole sequence is still protein), we opted for an arrow connecting one frame to another (see Fig 4 in https://www.embopress.org/doi/10.15252/msb.20188719). This could be adapted for your case maybe as follows. This would also connect with the idea of a RBS and translation initiation due to its similarity.

2A.pdf

jakebeal commented 4 years ago

Interesting! Would that be better used for frameshifting, though?

chofski commented 4 years ago

For a frameshift the CDSs would be overlapping so I think there would be a clear difference. I'm not strongly advocated the arrow, but do think it nicely shows the skip along the transcript the sequence induces.

jakebeal commented 4 years ago

Any other candidates? Opening this up for brainstorming...

graik commented 4 years ago

I think most publications use a box with "2A" inside. Example from Ron Weiss' lab:

I am not saying this is the best ever way of doing it but it may cause consternation if we depart very much from that.

Personally, I would prefer to have at least a small gap in between:

On Sun, Oct 27, 2019 at 7:12 PM Jacob Beal notifications@github.com wrote:

Any other candidates? Opening this up for brainstorming...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/78?email_source=notifications&email_token=AAOGZXMOP5YWIWCAHO4DHHDQQW4WLA5CNFSM4JDRIQB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECLB6KA#issuecomment-546709288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOGZXKVLGD6W3DP7QUVXMLQQW4WLANCNFSM4JDRIQBQ .

--

Raik Grünberg http://www.raiks.de/contact.html

graik commented 4 years ago

A visually perhaps more intuitive low-key alternative could be a simple gap with "2A" (I think I have seen this somewhere): [image: image.png] Or one could improvise something reminding of the terminator symbol: [image: image.png] (but I just made this one up, I don't think there would be any precedence for it)

On Sun, Oct 27, 2019 at 8:23 PM Raik Grünberg raik.gruenberg@gmail.com wrote:

I think most publications use a box with "2A" inside. Example from Ron Weiss' lab: [image: image.png] I am not saying this is the best ever way of doing it but it may cause consternation if we depart very much from that.

Personally, I would prefer to have at least a small gap in between: [image: image.png]

On Sun, Oct 27, 2019 at 7:12 PM Jacob Beal notifications@github.com wrote:

Any other candidates? Opening this up for brainstorming...

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/78?email_source=notifications&email_token=AAOGZXMOP5YWIWCAHO4DHHDQQW4WLA5CNFSM4JDRIQB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECLB6KA#issuecomment-546709288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOGZXKVLGD6W3DP7QUVXMLQQW4WLANCNFSM4JDRIQBQ .

--

Raik Grünberg http://www.raiks.de/contact.html

--

Raik Grünberg http://www.raiks.de/contact.html

shyambhakta commented 4 years ago

Hold up, there is a false assumption here that the 2A is not translated. It absolutely is translated by the currently hypothesized mechanism we discussed in the original closed issue.

During translation of the 2A, the peptide sequence causes the ribosome to skip a peptide bond formation between the 2A's final two amino acids, leaving all but its last aa to be C-terminally tagged on the upstream protein which is released because of the skipped peptide bond; the last aa of the 2A is fused to the remainder of the CDS downstream of it (second cistron), which continues being translated.

Neither is the translation frame altered as in bacterial multicistronic operons, wherein a ribosome is on an internal RBS when it reaches the stop codon and overlapping start codon of two CDSs.

The 2A feature is thus a protein feature that acts like a mid-translation ribosome-mediated protease site, except it acts upon the ribosome to cause a peptide bond to never form in the first place, instead of the sequence being formed and then cleaved after recognition. Still, cleavage of the peptidyl-tRNA bond must still occur by an unknown mechanism inside the ribosome for the first protein segment to be released and translation to proceed to the second.

jakebeal commented 4 years ago

Good points, @shyambhakta ; @graik , @chofski --- how does this affect your thinking?

shyambhakta commented 4 years ago

The fact that relative protein separation "cleavage" efficiencies matter in both 2As and true protease sites based on peptide sequence/structure context, and the fact that the current ×-capped helix protein cleavage site glyph evinces a small peptide sequence signaling the separation of a polypeptide at a point, allowed me to see the protein cleavage site glyph as being fine to describe the 2A's function to a satisfying approximation needed of a glyph. They both can be accurately described as signaling the separation of what is otherwise one polypeptide into two polypeptides, which the glyph evinces.

However, I also understand making a new glyph from the standpoint that protease sites in syn bio are generally controlled through protease expression, whether synthetic or native, whereas 2A efficiencies are never controlled by another molecular species). 2As are used simply for multicistronic expression.

I also can imagine the unrelated mechanisms of 2A versus protease sites to matter in eukaryotic circuit that uses protease logic in combination with 2A-mediated multicistronic expression. I'll try to direct some of the mammalian syn bio labs at Rice to comment on how important this may be to consider.

Jihwan-Lee1 commented 4 years ago

Here is my opinion: given that 1) the usage/application of 2A and protease cleavage sites are different and 2) the mechanisms of action are different, it will be more coherent to have a separate glyph for the 2A sequences. Imagine a situation which your circuit has a protease cleavage site and a 2A sequence. Using the same glyph might confuse the readers making them think they have the same role in the circuit.

@graik I cannot see the images you posted.

graik commented 4 years ago

Sorry about the broken images. Replying by e-mail doesn't support images, it seems. So here they are again (1) Example figure from a paper by the Ron Weiss lab. This way of showing 2A sites is pretty common: (2) first suggestion of a low-key 2A symbol (with an added gap to the two CDS): (3) Second suggestion, even more simple: (4) Third suggestion, completely made up, with graphical reference to the terminator symbol:

Cheers, Raik

Fontanapink commented 4 years ago

Hello SBOL Visual Community, Looks like this discussion has wined down a little bit. Does anyone have any other glyph alternatives they'd want to pitch in? What about the discussion on 2A sites? @shyambhakta, @graik , @chofski , @jakebeal any consensus on how we should consider 2A sites?

jakebeal commented 4 years ago

@Fontanapink I plan to return to this, but haven't had a chance yet.

shyambhakta commented 4 years ago

To comment on the images above that @graik posted,

(1) is ok, but but it isn't a new glyph; it's just a labeled CDS segment, which a 2A is genuinely. This would be a fine recommendation if we don't think 2A requires a new glyph. Should be a domain glyph to conform to the latest version, as in (5) below. (2) might be misleading, evincing that the 2A excises itself, which is another kind of element called an intein. Also, the box may be the regarded as the engineered region glyph. (3) probably also violates a rule, as the box representing the first domain of the CDS is also an "engineered region" glyph. When choosing a CDS domain glyph, we had to rule out vertical line separators because of this. (4), as with (2–3), options that break the CDS glyph when the CDS isn't truly divided translationally probably aren't great options. Also, I doubt we can use a reverse terminator symbol — already used for terminators on the reverse strand.

We settled on a domain glyph. What if a 2A glyph is a modification of a domain glyph, as it is a type of domain – one that induce an internal skip in peptide bond formation. Note: we probably can't use the jagged line dividers inside CDSs that we decided would be used for exon-intron boundaries. Here are some ideas:

graik commented 4 years ago

Thanks for reviving this discussion.

I don't think that the 2A symbol should be integrated as a domain glyph within CDS. It's a massive interruption of translation. It's not a protein sub domain at all. In a way it is an alternative way of connecting two CDS into a polycistronic construct. So the CDS symbol definitely has to be disrupted one way or the other.

Also, whatever we choose, the "2A" label should be part of it. Most people won't be able to guess what we mean no matter how great the symbol is.

All that considered, I still think the easiest solution is to simply put "2A" as a label on a baseline between two CDS. It's simple, intuitive and not to far away from what is used in real world papers.

Greetings

On Thu, May 7, 2020 at 08:20 Shyam Bhakta notifications@github.com wrote:

To comment on the images above that @graik posted https://github.com/SynBioDex/SBOL-visual/issues/78#issuecomment-548901779 ,

(1) is ok, but but it isn't a new glyph; it's just a labeled CDS segment, which a 2A is genuinely. This would be a fine recommendation if we don't think 2A requires a new glyph. Should be a domain glyph to conform to the latest version, as in (5) below. (2) might be misleading, evincing that the 2A excises itself, which is another kind of element called an intein. Also, the box may be the regarded as the engineered region glyph. (3) probably also violates a rule, as the box representing the first domain of the CDS is also an "engineered region" glyph. When choosing a CDS domain glyph, we had to rule out vertical line separators because of this. (4), as with (2–3), options that break the CDS glyph when the CDS isn't truly divided translationally probably aren't great options. Also, I doubt we can use a reverse terminator symbol — already used for terminators on the reverse strand.

We settled on a domain glyph. What if a 2A glyph is a modification of a domain glyph, as it is a type of domain – one that induce an internal skip in peptide bond formation. Note: we probably can't use the jagged line dividers inside CDSs that we decided would be used for exon-intron boundaries.

[image: image] https://user-images.githubusercontent.com/5035245/81256935-918f4080-8ff7-11ea-8391-d64280b6baee.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/78#issuecomment-625034965, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOGZXJNR2NI4JLQ6WYB3EDRQJAK5ANCNFSM4JDRIQBQ .

--

Raik Grünberg http://www.raiks.de/contact.html

shyambhakta commented 4 years ago

Based on the definition of a CDS starting at a start codon and stopping at a stop codon, isn't it a requirement that the 2A peptide, or any functional segment of amino acid-encoding codons (i.e. CDS domain), not interrupt the CDS glyph?

This raises the existential question — does a CDS glyph start/stop where a polypeptide correspondingly starts/stops, or is it as in the current widespread definition the segment of DNA continuously encoding amino acids, from translation initiation to termination? typically in response to an (RBS)–START–STOP codons, with codons in-frame, unless there's a translation frameshifting element.

It's hard for me to say that the latter definition should be violable in the corresponding glyphs. But even if there were a normal TEV protease site in a linker between domains, or an ssrA degradation tag appended to the end of a protein, should these be superimposed on the corresponding part of the CDS glyph, or do they break the CDS glyph with domain glyphs in series with coil-stemmed glyphs where such glyphs are more descriptive than a plain domain glyph? @jakebeal

Maybe both the fact that the polypeptide is split with varying efficiencies, despite the CDS not being split, can be evoked by the design below. I'm not sure the "2A" label is essential, but all current such elements are called 2A peptides generally (P2A, T2A, F2A…) based on: "The name "2A" itself comes from the gene numbering scheme of this virus. I'd imagine it's the same in other languages. The point of consideration is that the future might find/engineer peptide bond-skipping peptides and call them something else. Maybe the "2A" label suggestion can be dropped if/when that happens or suggested if and only if it is a member of the 2A peptide family.

JS3xton commented 4 years ago

I think DNA that codes for protein should be within a CDS or CDS-like glyph.

I appreciate the simplicity of @graik's proposal ("2A" between two CDSs), but I think it fails to indicate that the 2A nucleotides code for protein (as I understand the mechanism). I think @shyambhakta's (9) does a better job of this.

Perhaps this further improves (9):

As I understand the 2A mechanism, separation occurs at the C-terminus of the 2A element. The dotted line domain boundary is meant to indicate this. The solid domain boundary at the N-terminus of the 2A element indicates (rightly) that the 2A element is a domain of the first polypeptide.

Jihwan-Lee1 commented 4 years ago

Hi @shyambhakta, @graik, and @JS3xton Thank you for reviving the discussion.

I believe (1) is how most of the people represent 2A.

When we use the 2A sequence for a polycistronic expression of genes A and B, we need to remove the stop codon on gene A. So technically by definition of CDS, GeneA-2A-GeneB is a CDS. So in my opinion, the CDS glyph should not be broken.

I really like @JS3xton's version 10. It shows that EBFP-2A-Bla is a CDS. It goes along with how we usually denote fusion proteins (e.g.. most of the 2A sequence is fused to EBFP). And the dotted line differentiates our case from cases that we use to show triple gene fusion.

We could also consider combining some aspects of glyphs together. (Similar to how the Chinese language combines characters to come up with characters with new meaning). Here, I have combined the stem of the proteolytic cleavage glyph with the CDS glyph. Again, we don't want to use the proteolytic cleavage glyph itself for 2A sequences (see correspondence above), but we can still use a form of the glyph so that the literate users would be able to infer there will be some sort of cleavage-like outcome.

PS. Strictly, speaking there is a proline that gets added on to the downstream gene, but I don't think the glyphs should convey such details.

graik commented 4 years ago

Hi everyone, @shyambhakta, @JS3xton, @Jihwan-Lee1,

I appreciate the simplicity of @graik's proposal ("2A" between two CDSs), but I think it fails to indicate that the 2A nucleotides code for protein (as I understand the mechanism). I think @shyambhakta's (9) does a better job of this.

We should really not get stuck up on the mechanism. First, I would disagree that 2A "codes for a protein", it codes for a "protein interruption". Second, the most important is that the glyph conveyes that this is a sequence that is going to split the CDS in two. Any solution that puts 2A as a sub-domain into the CDS can be confusing. I don't think proposals 10 - 12 are intuitive. 10 would more look like the separation between two protein domains is not very strong and 11 and 12 are just not suggesting any meaning to me. @shyambhakta 's A and B are again mixing in the protease site glyph. The whole discussion started from the point that that's not such a good idea.

I like @shyambhakta 's version (9). I would go further and propose a modification of my (3) further up: (13) 20200508_2A_symbol_v13 I think this one clearly conveys that the CDS has been interrupted but still somehow forms one unit.

Personally, I would prefer my simpler version 3 from above, here again: (3) 20200508_2A_symbol_v14 It's much simpler to draw in any kind of program.

In any case, 3, 10, and 13 are all going in the same direction. They include "2A" as a label, which I think is most important, and they illustrate a somehow interrupted CDS.

graik commented 4 years ago

Sorry, I meant 3, 9 and 13 are going in the same direction.

shyambhakta commented 4 years ago

@graik To clarify, (A) and (B) weren't for the 2A peptide; that figure corresponded to the three paragraphs above it, discussing the superimposition of glyphs on the CDS and the CDS glyph's definition, which is something I'd like to hear your view on (top half of my last post). Because the decision on that would modify the CDS glyph specs and would narrow which of these 2A glyphs are even valid.

jakebeal commented 3 years ago

I'd like to revive this discussion, as it seemed to me that we'd gotten some great ideas and might be close to a consensus.

In what has been posted so far, I see that three proposals have both drawn support and are compatible with other glyphs (notably intron, polypeptide region, CDS, and engineered region). I see the following advantages and disadvantages in the three proposals:

(9)
- Advantages: simple, is essentially just styling and labeling a polypeptide region.
- Disadvantages: makes a 2A look like a fragment of a coding sequence, rather than a coding skip; is essentially just styling and labeling a polypeptide region.
(12)
- Advantages: highly distinct; uses the same "boundary in a CDS indicates a feature" pattern as intron and polypeptide region. Hints at cleavage with the "biopolymer location" composite.
- Disadvantages: Does not include the word "2A".
(13)
- Advantages: very simple
- Disadvantages: breaks the CDS into apparently separate glyphs (one of which is otherwise invalid), does not reflect the partial nature of many 2A sequences.

Looking at them together like this, I'm actually liking option (12) more and more. The only disadvantage that I see is that a person is not required to use the word "2A". The 2A can still be added, though, but even that is still allowed as a label if one wishes.

What do others think?

shyambhakta commented 3 years ago

As I mentioned before, the 2A glyph cannot connote that the CDS is broken. Both popularly in biology as in the SO term, "CDS" is defined as A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon. (13) and (3) break the CDS by showing the DNA backbone in between, so have to be ruled out, I would reason. Even if that segment of the backbone is sort of counted as part of the CDS, it makes sense not to strongly connote CDS breakage because 2A peptides don't work at all in bacteria, in which the peptide is just normally translated; and even eukaryotic ribosomes have highly varying efficiencies of peptide bond-skipping, depending on species and 2A peptide context. So 2A users expect a proportion of unseparated fused proteins (yet still fuse even four CDSs together with 3 2A's, despite multiplying the cleavage inefficiency).

I can get behind these six variants (remade in ppt). The gray text 2A is just to help imagine if it were labeled; I wouldn't make it a necessary part of any glyph.

(12) and (14) are incidentally easy to make using dashed line types with standard arrow and chevron shapes in PowerPoint. I like (14) better than (12) because (14)'s outer CDS glyph is fully preserved, which makes the CDS not interpretable as broken/interrupted in any sense as Jake mentioned — it's still start–>stop if the CDS glyph is intact.

(9) repurposes the stem of the protein cleavage "protease site" glyph, and the new (15) and (16) repurpose the top of it. The stem of the protein cleavage glyph only means "protein location" and doesn't itself evoke the 2A function, as does the × from the top of the cleavage glyph. But it does split the CDS in a vertical way so does evoke some sort of separation/splitting. Tops aren't in the provided glyph set, so an × glyph would have to be added for (15)/(16) (or understood as easy enough to delete a stem), unlike the existing protein location glyph in (9).

Thinking more about how glyphs can be repurposed… (17) extends the principle of using the asterisk to signify termination, as in the transcription and translation end glyphs. While they, respectively, have the DNA and RNA stems because that's how they manifest, here the signal is in the peptide structure/sequence, so ought to have a protein stem. But I think it looks better to omit it — it's already inside a domain glyph; perhaps that's enough. If that's going too far, at least (15)/(16) don't appear to break any existing glyph norms.

@Jihwan-Lee1

graik commented 3 years ago

Hi all,

I think #9 in Jake's e-mail seems to be the best compromise (#12 in Shyam's mail). Personally, I would have preferred Jake's #13 but I can also see why this could be confused as two CDS. But remember that this discussion only applies to eukaryotic hosts where a naked CDS would still not be functional without it's own promoter, RBS, etc.

I really find Jake's #12 un-intuitive.

Shyam's #14 looks more like a "weak domain boundary" or protein segment sub-division. It doesn't quite capture the drama of a protein coding region split into two. His #16 has a certain appeal but, as Jake repeated, "2A" should be part of the glyph as it is too specialist of a signal to be understood without the label. The added advantage of Jake's #9 is that, if it later should become a very common sight within a larger specialist community, they could still decide to leave away the "2A" label and have a unique, recognizable and relatively intuitive symbol. So my vote would go to #9 in Jake's mail.

Greetings Raik

On Sun, Oct 4, 2020 at 5:20 PM Shyam Bhakta notifications@github.com wrote:

As I mentioned before, the 2A glyph cannot connote that the CDS is broken. Both popularly in biology as in the SO term, "CDS" is defined as A contiguous sequence which begins with, and includes, a start codon and ends with, and includes, a stop codon. (13) and (3) break the CDS by showing the DNA backbone in between, so have to be ruled out, I would reason. Even if that segment of the backbone is sort of counted as part of the CDS, it makes sense not to strongly connote CDS breakage because 2A peptides don't work at all in bacteria, in which the peptide is just normally translated; and even eukaryotic ribosomes have highly varying efficiencies of peptide bond-skipping, depending on species and 2A peptide context. So 2A users expect a proportion of unseparated fused proteins (yet still fuse even four CDSs together with 3 2A's, despite multiplying the cleavage inefficiency).

I can get behind these six variants (remade in ppt). The gray text 2A is just to help imagine if it were labeled; I wouldn't make it a necessary part of any glyph.

(12) and (14) are incidentally easy to make using dashed line types with standard arrow and chevron shapes in PowerPoint. I like (14) better than (12) because (14)'s outer CDS glyph is fully preserved, which makes the CDS not interpretable as broken/interrupted in any sense as Jake mentioned — it's still start–>stop if the CDS glyph is intact.

(9) repurposes the stem of the protein cleavage "protease site" glyph, and the new (15) and (16) repurpose the top of it. The stem of the protein cleavage glyph only means "protein location" and doesn't itself evoke the 2A function, as does the × from the top of the cleavage glyph. But it does split the CDS in a vertical way so does evoke some sort of separation/splitting. Tops aren't in the provided glyph set, so an × glyph would have to be added for (15)/(16) (or understood as easy enough to delete a stem), unlike the existing protein location glyph in (9).

Thinking more about how glyphs can be repurposed… (17) extends the principle of using the asterisk to signify termination, as in the transcription and translation end glyphs. While they, respectively, have the DNA and RNA stems because that's how they manifest, here the signal is in the peptide structure/sequence, so ought to have a protein stem. But I think it looks better to omit it — it's already inside a domain glyph; perhaps that's enough. If that's going too far, at least (15)/(16) don't appear to break any existing glyph norms.

[image: image] https://user-images.githubusercontent.com/5035245/95017989-3901b500-0622-11eb-9f98-cfb86bbc0ce1.png

@Jihwan-Lee1 https://github.com/Jihwan-Lee1

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/78#issuecomment-703262724, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOGZXL3GNSK6IHZP3LFDN3SJCABVANCNFSM4JDRIQBQ .

--

Raik Grünberg http://www.raiks.de/contact.html

jakebeal commented 3 years ago

This is all leading me to think that a good representation of 2A may be accomplished without actually add a new glyph, but simply giving a recommendation and example of how to use the existing polypeptide region glyph.

If we go with the "dashed region" approach (copied below for clarity), then all we actually need to do is to add a note to the existing polypeptide region glyph to note that this glyph works for 2A sequences as well, and maybe a cross-reference over from cleavage sites.

jakebeal commented 3 years ago

I've updated the SEP to reflect this recommendation (example still to be added): https://github.com/SynBioDex/SBOL-visual/blob/master/SEPs/SEP_V020.md

graik commented 3 years ago

Sounds good!

On Mon, Oct 5, 2020 at 2:11 PM Jacob Beal notifications@github.com wrote:

I've updated the SEP to reflect this recommendation (example still to be added): https://github.com/SynBioDex/SBOL-visual/blob/master/SEPs/SEP_V020.md

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-visual/issues/78#issuecomment-703563808, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAOGZXNFURECQAMLHKFN4H3SJGSVTANCNFSM4JDRIQBQ .

--

Raik Grünberg http://www.raiks.de/contact.html

Jihwan-Lee1 commented 3 years ago

Looks good.

chofski commented 3 years ago

Sorry for the delay. I also think this is a good compromise.

jakebeal commented 3 years ago

OK; I think we're ready to vote on this one then.

jakebeal commented 3 years ago

Note: I've updated the SEP and branch to include an example:

glyph example

jakebeal commented 3 years ago

Closing as accepted and incorporated.

SynBioDex / SBOL-visual

SEP V020: Recommend use of polypeptide region for 2A Sequences #78