SynBioDex / SBOL-visual

The reference implementation of the SBOL Visual standard
Other
33 stars 16 forks source link

SEP V004: New Glyph Proposal Collection #8

Closed jakebeal closed 5 years ago

jakebeal commented 7 years ago

SEP V004: New Glyph Collection

SEP
Authors Jacob Beal (jakebeal@ieee.org)
Editor
Type Specification
SBOL Visual Version 1.1
Status Draft
Created 22-Aug-2017
Last modified 18-Sep-2017

Abstract

A number of new glyphs have been proposed over the past few years, and we need to put them to an up-or-down vote.

There are twelve proposals currently pending: Aptamer, Codon, Homology Region, Inverter, Non-Coding RNA, ORI-T, polyA Site, Protein Domain, Specific Recombination Site, Non Directional Sticky End, Tag, Transcript Region

Table of Contents

1. Rationale

Each glyph detailed below in its specification has been provided with an individual rationale for that glyph. Examples are also embedded within each proposal.

2. Specification

Aptamer

Associated SO term(s)

SO:0000031: Aptamer

Recommended Glyph and Alternates

The proposed aptamer glyph is a cartoon diagram of nucleic acid secondary structure like that found in aptamers:

glyph specification

Prototypical Example

theophylline aptamer

Non-Coding RNA Gene

Associated SO term(s)

SO:0001263: Non-Coding RNA Gene SO:0000834: Mature Transcript Region

Recommended Glyph and Alternates

Two of the proposed non-coding RNA glyphs are both single-stranded RNA "wiggles," one on top of a box:

glyph specification

another hovering above the backbone:

glyph specification

One or the other of these should be chosen, but not both.

Prototypical Example

gRNA

ORI-T

Associated SO term(s)

SO:0000724: Origin of Transfer

Recommended Glyph and Alternates

The origin of transfer glyph is circular like Origin of Replication, but also includes an outbound arrow:

glyph specification

Prototypical Example

oriT

Notes

The recommended backbone location of Origin of Replication is not yet fixed; the backbone location of this glyph is intended to match Origin of Replication, so it that is recommended to become below the glyph, this backbone location will shift as well.

polyA site

Associated SO term(s)

SO:0000553: polyA Site

Recommended Glyph and Alternates

The polyA site glyph is a sequence of As sitting atop the backbone:

glyph specification

Prototypical Example

polyA tail on mammalian coding sequence

Specific Recombination Site

Associated SO term(s)

SO:0000299: Specific Recombination Site

Recommended Glyph and Alternates

The specific recombination site glyph is a triangle, centered on the backbone, as has appeared in a number of recombinase circuit papers:

glyph specification

Prototypical Example

flippase recognition target (FRT) site

Notes

Potential conflict with proposed Inverter glyph.

3. Examples

See examples in individual glyph proposals.

4. Backwards Compatibility

All proposals are for new glyphs that do not conflict with existing glyphs. Note that two proposals (Inverter and Recombinase Site) do conflict with one another.

5. Discussion

The following proposed options have been considered, but do not have strong support and are thus being removed from consideration unless they pick up significant advocacy. They may be revisited in the future.

Aptamer

glyph specification

Codon

Associated SO term(s)

SO:0000360: Codon

SO:0000318: Start Codon

SO:0000319: Stop Codon

Recommended Glyph and Alternates

The proposed aptamer glyphs are two versions of a cartoon diagram of nucleic acid secondary structure like that found in aptamers:

glyph specification

Nucleotides can be indicated with colors or letters in the boxes:

glyph specificationglyph specification

Proteins can be indicated by a letter above:

glyph specification

Stop and start codons might be indicated by special symbols:

glyph specificationglyph specificationglyph specification

Edits can be indicated by changes:

glyph specificationglyph specificationglyph specificationglyph specificationglyph specificationglyph specification

Prototypical Example

UGA stop codon

Notes

If accepted, there will need to be additional work done to elaborate the full specification.

Homology Region

Associated SO term(s)

SO:0000853

Recommended Glyph and Alternates

The homology region glyph is a stretched hexagon hovering above the backbone:

glyph specification

Prototypical Example

Needs a good example

Inverter

Associated SO term(s)

No SO term currently exists

Recommended Glyph and Alternates

The inverter glyph is a triangle, echoing the buffer glyph from electronics. It might be either above or on the backbone.

glyph specification

glyph specification

Prototypical Example

Needs a good example

Notes

Potential conflict with proposed Specific Recombination Site glyph.

Non-Coding RNA

Squiggle with teeth:

glyph specification

Peeling comb suggesting an RNA sequence partially attached to the backbone:

glyph specification

Non Directional Sticky End

Associated SO term(s)

SO:0001692 (unspecified direction)

Recommended Glyph and Alternates

A sticky restriction site of unspecified direction is an angled set of cuts:

glyph specification

Prototypical Example

EcoRI restriction site.

ORI-T

Spirals outward toward a new destination rather than being a closed circle. Two slightly different variants of spiral are proposed for consideration:

glyph specification

glyph specification

Protein Domain

Associated SO term(s)

SO:0000417 Polypeptide Domain

Recommended Glyph and Alternates

A number of proposals have been made for Protein Domain glyphs. These are:

glyph specification

glyph specification

glyph specification

glyph specification

glyph specification

glyph specification

Prototypical Example

VP64 activation domain

Notes

Protein domain should have the same recommended vertical position as CDS, but CDS does not have a recommended vertical position yet, so these proposals do not either.

Tag

Associated SO term(s)

SO:0000324: Tag

Recommended Glyph and Alternates

The tag glyph is a diagonal rectangle with clipped corners, reminiscent of a stereotypical paper gift tag:

glyph specification

Prototypical Example

PEST tag

Copyright

CC0
To the extent possible under law, SBOL developers has waived all copyright and related or neighboring rights to SEP V004. This work is published from: United States.

jakebeal commented 7 years ago

My take: I like:

I dislike:

I am neutral on:

swapnilb commented 7 years ago

Recombinase sites have been in wide use in literature, so I would object to changing that glyph. I don't know what is meant by an inverter -- is this a device-level glyph? My impression is that our first pass (should) cover(s) biological primitive parts.

jakebeal commented 7 years ago

I'm not going to defend inverter --- it was made as a proposal, and we need to give it an up or down vote at some point. I will be voting "no" on inverter.

swapnilb commented 7 years ago

OriT was informally proposed by Angel Goni / @jakebeal / others in an SBOL Visual Thread (9/2014) to be the following:

image

I propose we consider it as a candidate glyph.

jakebeal commented 7 years ago

Thank you, Swapnil. I've added it, and like it very much as an option.

swapnilb commented 7 years ago

The "Tag" is somewhat confounding. The example cited is of a PEST tag, which appears to be a functional component (acts as a signal peptide for degradation). Yet, the SO term is specific to oligo tags used in identifying DNA; indeed, it's parent is an oligo.

I suggest we separate out which meaning is intended, and I also suggest we address "functional" tags (degradation, routing, fluorescence, etc) as well, if not before, identification tags, because the former have been used frequently in synthetic biology literature (e.g. Friedland et al. 2009).

swapnilb commented 7 years ago

Separately, my other objection to the Tag glyph is that it is not naturally scalable as may be needed to depict a sequence of varying length (whether functional, or oligo tag). It is also a simply rotated variant of another glyph, which makes us introduce two unrelated glyphs that are unnecessarily close.

jakebeal commented 7 years ago

Here is the current source for the Tag glyph proposal:

  1. The iGEM registry has many parts with "Tag" as their class, e.g. BBa_K1616026. You can see the little "gift tag" icon on the page.
  2. If you search in SynBioHub for parts with a Tag role, you can readily find all of these, and see the "cleaned up" version of the glyph that is here.
  3. Tags like these are commonly used, and it would be useful to be able to have glyphs for them, but to the best of my knowledge no other graphical convention currently exists.

You're right about the SO term being a problem, though --- if we can figure out what the right term is, that can be updated or fixed.

mgaldzic commented 7 years ago

The SO branch that deals with a peptide sequence binding site: protein_protein_contact http://www.sequenceontology.org/browser/current_svn/term/SO:0001093 Maybe, a new sibling to PIP box http://www.sequenceontology.org/miso/current_svn/term/SO:0001810 - note it is also a polypeptide_region

Other tags may belong in, Experimentally added features will be separate... not obviously consistent with the first... http://www.sequenceontology.org/miso/current_svn/term/SO:0001697 -mike

On Tue, Aug 22, 2017 at 3:15 PM Jacob Beal notifications@github.com wrote:

Here is the current source for the Tag glyph proposal:

  1. The iGEM registry has many parts with "Tag" as their class, e.g. BBa_K1616026 http://parts.igem.org/Part:BBa_K1616026. You can see the little "gift tag" icon on the page.
  2. If you search in SynBioHub for parts with a Tag role https://synbiohub.org/search/role%3D%3Chttp%3A%2F%2Fwiki.synbiohub.org%2Fwiki%2FTerms%2Figem%23partType%2FTag%3E%26, you can readily find all of these, and see the "cleaned up" version of the glyph that is here.
  3. Tags like these are commonly used, and it would be useful to be able to have glyphs for them, but to the best of my knowledge no other graphical convention currently exists.

You're right about the SO term being a problem, though --- if we can figure out what the right term is, that can be updated or fixed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-324166975, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYzDaF_em-yJ3MEe8mC9PHwb4pqcjiHks5sa1L1gaJpZM4O_B9B .

swapnilb commented 7 years ago

Is anyone more familiar with http://obofoundry.org/ontology/pr.html -- has terms closer in meaning to some of the "functional" tags. (Some of the links don't appear to work.)

chofski commented 7 years ago

Here are my views:

swapnilb commented 7 years ago

Regarding the Protein Domain glyph: I am a little confused.

  1. I suspect that this glyph would necessarily have to be composable with CDS glyph(s), without confounding the meaning of the CDS glyph.
  2. Protein Domains, I presume, can be integrated into any protein, as per classic synbio paradigm.
  3. Now, take any of the 5 out of the 6 proposed glyphs. They all have arrowheads.
  4. So if we were to represent domain X engineered into the middle of a CDS coding for protein Y, then we would have a "composite glyph" that looked like

image

where the red glyph (assume it is one of the 5 proposed above) is the engineered domain. This at best, can be a bit confusing, because of the extra arrowhead. How would a CDS encoding a fusion proteinor a multi-CDS construct be denoted, without confounding it with Protein Domains?

FWIW, pigeon has allowed for fusion CDSes to be denoted as follows, and it has seen quite a bit of use (judging from several folks who signed up to ask for it, out of the blue):

image

jakebeal commented 7 years ago

@swapnilb I agree that most of the proposed glyphs for Protein Domain don't compose nicely. The two that do are:

This is why I like chevron best of the options. We could further clarify the composition as a note, saying something like "when a Protein Domain is indicated within a CDS, 'internal' boundaries use the chevron, while boundaries at the extremes of the CDS follow the convention of the CDS"

swapnilb commented 7 years ago

@jakebeal Agree with almost everything in your latest comment, except one point:

One problem with the chevron is that if you have overlapping domains, not an entirely rare use case, then if you overlap two chevrons, you can get an unintended number of chevrons. This is because of the identical geometry of the pair of lines at the start and end of the chevron. If we could fix this, then a chevron glyph for PD could work well.

jakebeal commented 7 years ago

I believe we've got lots of good options for distinguishing overlapping domains. One is simply to use fills that indicate the overlap:

f3wkr

One can also do it way vertical separation, textual annotations, etc. The preference will likely depend on the circumstance.

swapnilb commented 7 years ago

It is not clear: my use case is that in a drawing such as yours, domain red and domain blue overlap in the grey CDS. That is, domain blue would be codons b_i to b_j and red would be r_i to r_j, where b_i < r _i < b_j < r_j. In this case, we would get an unintended purple chevron, as you have shown. It is not immediately clear from your picture, that the purple chevron is NOT a third PD. In fact, it is very nicely suggestive of the purple chevron being a third PD. This is undesirable in designing a glyph.

jakebeal commented 7 years ago

So spread out the hatching a bit so it's less ambiguous, or use outlines instead, or text... my point is that I find this to be solvable with the tools already at hand.

chevronsa

chevronsb

chevronsc

chevronsd

jakebeal commented 7 years ago

I have updated the SEP based on the current state of discussion. Here is what I am currently seeing:

Symbols with apparent consensus in favor of a specific glyph:

Symbols with apparent consensus, but multiple glyph options yet to be resolved:

Symbols without a clear consensus:

Symbols with apparent consensus against:

jakebeal commented 7 years ago

@chofski With respect to ncRNA, are you thinking something like this?

glyph specification

chofski commented 7 years ago

Yes, precisely. Or slight variations on it, e.g., where the "teeth" are perpendicular to the backbone and not all parallel.

jakebeal commented 7 years ago

@chofski Would you be willing to say what you prefer about the "teeth" version over the "no teeth" version? Personally, I prefer the "no teeth" version because it is simpler to draw and matches the RNA symbols that I have seen in a number of scientific papers (e.g., http://www.nature.com/nbt/journal/v33/n8/abs/nbt.3301.html)

chofski commented 7 years ago

Yes, but it is a minor preference. I'd be happy with either to be adopted.

jakebeal commented 7 years ago

Thank you for the clarifications. I've added the "teeth" version to the options under consideration, and we'll see how the conversation continues to develop.

graik commented 7 years ago

The issue with the protein domain suggestions so far is that they are very unlike anything that is actually informally being used in the field today. The most common symbols being used currently are:

sep004_protein_domain_suggestion

I would further suggest that protein features are recommended to be placed above (or below) the encompassing CDS symbol, possibly with an extra baseline that is symbolizing the protein.

swapnilb commented 7 years ago

@chofski @jakebeal I highly prefer the non-teeth version for RNA. Teeth has way too high stroke complexity.

Also, if I understand correctly, we should be careful to call it "DNA encoding ncRNA" -- in that this glyph cannot be used to depict the RNA transcribed. (Or if it can, then that should be mentioned in the spec.) I also prefer the boxed variant for ncRNA since it clearly shows it as being part of the DNA.

As to chevron -- I understand that the problem can be solved, but I don't support adding a known confusion into the spec. The overlap problem needs to be dealt with at some point, and it is best if we don't make it worse.

swapnilb commented 7 years ago

@graik I concur that we need to develop a standard way to describe overlapping features. I agree, the PD would be better illustrated if it could be aligned beside the CDS.

swapnilb commented 7 years ago

@jakebeal please note this too:

Tag: I object to the current glyph because:

jakebeal commented 7 years ago

Also, if I understand correctly, we should be careful to call it "DNA encoding ncRNA" -- in that this glyph cannot be used to depict the RNA transcribed.

@swapnilb I believe the associated SO term is clear: "SO:0001263: Non-Coding RNA Gene", so I've changed the glyph name in the SEP to be exactly that.

jakebeal commented 7 years ago

@graik I'm a bit confused by your proposal: it sounds like you're thinking this glyph is describing part of an actual protein, like in our ACS SynBio paper? That is not the case here: per the SO term, this glyph is intended to describe a portion of a CDS.

graik commented 7 years ago

OK, that wasn't clear for me. I thought the same symbol would end up being used for protein sequence annotation as well. If we restrict ourself to DNA only, then the perhaps basic problem is how to have both the CDS symbol and overlapping RNA or protein symbols coexist. For example, I like the "Tag" symbol but how is it going to overlay on a CDS? Inside of it? Above it? Or should there be no CDS symbol if details are shown for the ORF?

Pragmatically, I think @swapnilb 's 2-way fusion CDS shown above is the most straightforward and intuitive to demarcate sub-elements in a ORF including domains.

jakebeal commented 7 years ago

@graik I agree that composition is a critical requirement of CDS "sub-components." We do not have a formal composition model, nor is this SEP the place to develop one (though I would encourage development of that in a new SEP).

I thus think that for CDS-related elements, we remain at the same place: showing composition by the "fusion" method used by @swapnilb, either with straight boundaries ("User Defined") or with angled boundaries (chevron).

Furthermore, given the serious problems that Tag has with composition and SO term, and the fact that these do not seem to have any promising paths to resolution, I have moved Tag to the "consensus against" set --- we would thus recommend that protein tags be indicated with Protein Domain glyphs, which I think is reasonable. I have also moved the ncRNA gene "squiggle with teeth" down since it had only mild support but strong opposition.

jakebeal commented 7 years ago

Would anybody like to speak about Homology Region, polyA site, or Non Directional Sticky End?

If I don't hear more significant backing for these, I am going to move them to "these don't seem important enough to anybody to add at this time."

cjmyers commented 7 years ago

I’m not locked into a particular glyph for it, but I believe we need one for polyA, since this is common in the iGEM dataset. Three As is fine by me, but I’m open to alternative suggestions.

On Sep 2, 2017, at 12:42 PM, Jacob Beal notifications@github.com wrote:

Would anybody like to speak about Homology Region, polyA site, or Non Directional Sticky End?

If I don't hear more significant backing for these, I am going to move them to "these don't seem important enough to anybody to add at this time."

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-326765491, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD943XE3ZijJLSmPKYFP9Hr7LSqzpHks5sea-zgaJpZM4O_B9B.

jakebeal commented 7 years ago

Updated again based on the current state of discussion. I think we may be nearly ready to move forward for a vote. Here is what I am currently seeing:

Symbols (maybe) ready for voting:

Symbols without sufficient backing, and thus not being voted on (they may be revisited in the future): Codon, Homology Region, Inverter, Non Directional Sticky End, Tag

We still need to discuss Non-Coding RNA and Mature Transcript Region. I believe that Mature Transcript Region should be discarded, since the SO term covers specifically transcripts, and not sequences coding for functional transcripts (which is handled neatly by ncRNA). I would then propose we move forward with a vote to decide which of the two ncRNA glyphs should be retained.

What do people think of this proposal?

cjmyers commented 7 years ago

ncRNA in SO (http://www.sequenceontology.org/browser/current_svn/term/SO:0000655) seems to describe an RNA and not a region of DNA that represents a non-coding RNA region. This is why we used mature transcript region (http://www.sequenceontology.org/browser/current_svn/term/SO:0000834). This is the parent term that includes mRNA, which includes CDS, ribosome entry site, etc. Granted mature transcript region is too general, but if you don't know what type of non-coding RNA it codes for then there is no good parent term to use. The solution here might be that we need to contact SO folks.

jakebeal commented 7 years ago

@cjmyers That is why the current proposal does not use that term. SO:0000834, in fact, has the same problem.

Instead, the current proposal uses Non-Coding RNA Gene (http://www.sequenceontology.org/browser/current_svn/term/SO:0001263), which is for a region of DNA that represents a non-coding RNA region.

cjmyers commented 7 years ago

Ah, did not notice that. That is a better definition, but it makes it less parallel with CDS. I would have expected these terms to be cousins.

jakebeal commented 7 years ago

I take back what I said about SO:0000834, which is fine. I was getting is confused with http://www.sequenceontology.org/browser/current_svn/term/SO:0000233 --- the parallel terminology in very differently structured parts of the ontology is frustrating to me sometimes.

I would be comfortable to have both SO:0001263 and SO:0000834 be legitimate SO terms for this. We are, in fact, allowed multiple terms, and have used it before (e.g., Ribosome Entry Site is given both SO:0000139 and SO:0000204).

cjmyers commented 7 years ago

I think we might want to see if we can get the SO folks to add a parent to the RNA coding regions of the DNA that are not mRNAs. In other words a child of 834 that is a parent to all but 836.

jakebeal commented 7 years ago

That would still be problematic because 836 contains things like "riboswitch" and "RNA thermometer" that are functional RNA rather than protein coding RNA.

cjmyers commented 7 years ago

These though are still part of the mRNA to regulate its translation into a protein though? I thought the distinction we are wanting is a glyph that indicates that the region codes for protein or does not code for protein. My understanding is that mRNA codes for protein, and the other RNA options here do not, even if parts of the mRNA region are there only for regulation of translation.

jakebeal commented 7 years ago

I believe that things like riboswitches might be used to regulate other functional RNA as well, by modulating its stability. I am not certain, but certainly wouldn't count on artificial systems remaining isolated thus.

jakebeal commented 7 years ago

For now, I have merged the two, giving both SO terms.

jakebeal commented 7 years ago

Returning to the question of protein domains: I would like to remove the "rectangle" option. My reasons are:

  1. It is redundant with the use of rectangle for Unspecified. Thus, if we want "rectangle", we should just not assign a glyph.
  2. At the same time, it will conflict with the use of rectangle in Composite and (likely) No Glyph Assigned

@swapnilb @graik @cjmyers Would you be OK with this?

jakebeal commented 7 years ago

I'd also like to pick just one of the three ncRNA options, if we can. Would people be OK with the recommended vote being: "pick one" rather than "do you like all three as alternatives"? And can we eliminate any before voting?

cjmyers commented 7 years ago

I agree.

On Sep 17, 2017, at 6:13 AM, Jacob Beal notifications@github.com wrote:

Returning to the question of protein domains: I would like to remove the "rectangle" option. My reasons are:

It is redundant with the use of rectangle for Unspecified. Thus, if we want "rectangle", we should just not assign a glyph. At the same time, it will conflict with the use of rectangle in Composite and (likely) No Glyph Assigned @swapnilb https://github.com/swapnilb @graik https://github.com/graik @cjmyers https://github.com/cjmyers Would you be OK with this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-330039596, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD90YCc9i9lX7ldzSDBVgbyYVlG_-dks5sjQzqgaJpZM4O_B9B.

cjmyers commented 7 years ago

Ok with pick one. Also okay to eliminate one. I’m not partial to any particular one.

On Sep 17, 2017, at 6:14 AM, Jacob Beal notifications@github.com wrote:

I'd also like to pick just one of the three ncRNA options, if we can. Would people be OK with the recommended vote being: "pick one" rather than "do you like all three as alternatives"? And can we eliminate any before voting?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-330039637, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD92r3greD0GAPE2Jy20CSKNGdckZTks5sjQ0jgaJpZM4O_B9B.

jakebeal commented 7 years ago

On ncRNA, a bit of searching around the literature online finds that the main ways of diagramming ncRNA at present are:

I thus propose to remove the "peeling teeth" ncRNA from consideration, then put "wiggle" vs. "box-wiggle" to a vote, since I know that I strongly prefer "wiggle" and @swapnilb has stated that he prefers "box-wiggle".

cjmyers commented 7 years ago

Sounds good to me.

On Sep 17, 2017, at 12:32 PM, Jacob Beal notifications@github.com wrote:

On ncRNA, a bit of searching around the literature online finds that the main ways of diagramming ncRNA at present are:

"single strand wiggles" unspecified rectangles complex shape diagrams I thus propose to remove the "peeling teeth" ncRNA from consideration, then put "wiggle" vs. "box-wiggle" to a vote, since I know that I strongly prefer "wiggle" and @swapnilb https://github.com/swapnilb has stated that he prefers "box-wiggle".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-330068484, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD96VUhc6Nk45STDFBOmXd6fc5U3Egks5sjWXPgaJpZM4O_B9B.

swapnilb commented 7 years ago

I strongly prefer "box-wiggle." To reiterate: this is the DNA encoding some ncRNA. So I like that it is attached to the backbone and NOT hovering.

I think we should punt on protein domains. It requires more careful design. So I am OK with removing the rectangle option at the very least.

For the protein domain chevron, I am still NOT OK with it. It's known to not be, and by design, is not a good glyph. Therefore, I propose that we have another symbol such that none of its X-directional overlapping translations forms an unintended glyph. Any glyph that smoothly varies in the X/-X direction would do the job, and would improve the visualization of overlapping domains, which is likely a common use case.