Closed jakebeal closed 5 years ago
My take: I like:
I dislike:
I am neutral on:
Recombinase sites have been in wide use in literature, so I would object to changing that glyph. I don't know what is meant by an inverter -- is this a device-level glyph? My impression is that our first pass (should) cover(s) biological primitive parts.
I'm not going to defend inverter --- it was made as a proposal, and we need to give it an up or down vote at some point. I will be voting "no" on inverter.
OriT was informally proposed by Angel Goni / @jakebeal / others in an SBOL Visual Thread (9/2014) to be the following:
I propose we consider it as a candidate glyph.
Thank you, Swapnil. I've added it, and like it very much as an option.
The "Tag" is somewhat confounding. The example cited is of a PEST tag, which appears to be a functional component (acts as a signal peptide for degradation). Yet, the SO term is specific to oligo tags used in identifying DNA; indeed, it's parent is an oligo.
I suggest we separate out which meaning is intended, and I also suggest we address "functional" tags (degradation, routing, fluorescence, etc) as well, if not before, identification tags, because the former have been used frequently in synthetic biology literature (e.g. Friedland et al. 2009).
Separately, my other objection to the Tag glyph is that it is not naturally scalable as may be needed to depict a sequence of varying length (whether functional, or oligo tag). It is also a simply rotated variant of another glyph, which makes us introduce two unrelated glyphs that are unnecessarily close.
Here is the current source for the Tag glyph proposal:
You're right about the SO term being a problem, though --- if we can figure out what the right term is, that can be updated or fixed.
The SO branch that deals with a peptide sequence binding site: protein_protein_contact http://www.sequenceontology.org/browser/current_svn/term/SO:0001093 Maybe, a new sibling to PIP box http://www.sequenceontology.org/miso/current_svn/term/SO:0001810 - note it is also a polypeptide_region
Other tags may belong in, Experimentally added features will be separate... not obviously consistent with the first... http://www.sequenceontology.org/miso/current_svn/term/SO:0001697 -mike
On Tue, Aug 22, 2017 at 3:15 PM Jacob Beal notifications@github.com wrote:
Here is the current source for the Tag glyph proposal:
- The iGEM registry has many parts with "Tag" as their class, e.g. BBa_K1616026 http://parts.igem.org/Part:BBa_K1616026. You can see the little "gift tag" icon on the page.
- If you search in SynBioHub for parts with a Tag role https://synbiohub.org/search/role%3D%3Chttp%3A%2F%2Fwiki.synbiohub.org%2Fwiki%2FTerms%2Figem%23partType%2FTag%3E%26, you can readily find all of these, and see the "cleaned up" version of the glyph that is here.
- Tags like these are commonly used, and it would be useful to be able to have glyphs for them, but to the best of my knowledge no other graphical convention currently exists.
You're right about the SO term being a problem, though --- if we can figure out what the right term is, that can be updated or fixed.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-324166975, or mute the thread https://github.com/notifications/unsubscribe-auth/AAYzDaF_em-yJ3MEe8mC9PHwb4pqcjiHks5sa1L1gaJpZM4O_B9B .
Is anyone more familiar with http://obofoundry.org/ontology/pr.html -- has terms closer in meaning to some of the "functional" tags. (Some of the links don't appear to work.)
Here are my views:
Regarding the Protein Domain glyph: I am a little confused.
where the red glyph (assume it is one of the 5 proposed above) is the engineered domain. This at best, can be a bit confusing, because of the extra arrowhead. How would a CDS encoding a fusion proteinor a multi-CDS construct be denoted, without confounding it with Protein Domains?
FWIW, pigeon has allowed for fusion CDSes to be denoted as follows, and it has seen quite a bit of use (judging from several folks who signed up to ask for it, out of the blue):
@swapnilb I agree that most of the proposed glyphs for Protein Domain don't compose nicely. The two that do are:
This is why I like chevron best of the options. We could further clarify the composition as a note, saying something like "when a Protein Domain is indicated within a CDS, 'internal' boundaries use the chevron, while boundaries at the extremes of the CDS follow the convention of the CDS"
@jakebeal Agree with almost everything in your latest comment, except one point:
One problem with the chevron is that if you have overlapping domains, not an entirely rare use case, then if you overlap two chevrons, you can get an unintended number of chevrons. This is because of the identical geometry of the pair of lines at the start and end of the chevron. If we could fix this, then a chevron glyph for PD could work well.
I believe we've got lots of good options for distinguishing overlapping domains. One is simply to use fills that indicate the overlap:
One can also do it way vertical separation, textual annotations, etc. The preference will likely depend on the circumstance.
It is not clear: my use case is that in a drawing such as yours, domain red and domain blue overlap in the grey CDS. That is, domain blue would be codons b_i to b_j and red would be r_i to r_j, where b_i < r _i < b_j < r_j. In this case, we would get an unintended purple chevron, as you have shown. It is not immediately clear from your picture, that the purple chevron is NOT a third PD. In fact, it is very nicely suggestive of the purple chevron being a third PD. This is undesirable in designing a glyph.
So spread out the hatching a bit so it's less ambiguous, or use outlines instead, or text... my point is that I find this to be solvable with the tools already at hand.
I have updated the SEP based on the current state of discussion. Here is what I am currently seeing:
Symbols with apparent consensus in favor of a specific glyph:
Symbols with apparent consensus, but multiple glyph options yet to be resolved:
Symbols without a clear consensus:
Symbols with apparent consensus against:
@chofski With respect to ncRNA, are you thinking something like this?
Yes, precisely. Or slight variations on it, e.g., where the "teeth" are perpendicular to the backbone and not all parallel.
@chofski Would you be willing to say what you prefer about the "teeth" version over the "no teeth" version? Personally, I prefer the "no teeth" version because it is simpler to draw and matches the RNA symbols that I have seen in a number of scientific papers (e.g., http://www.nature.com/nbt/journal/v33/n8/abs/nbt.3301.html)
Yes, but it is a minor preference. I'd be happy with either to be adopted.
Thank you for the clarifications. I've added the "teeth" version to the options under consideration, and we'll see how the conversation continues to develop.
The issue with the protein domain suggestions so far is that they are very unlike anything that is actually informally being used in the field today. The most common symbols being used currently are:
I would further suggest that protein features are recommended to be placed above (or below) the encompassing CDS symbol, possibly with an extra baseline that is symbolizing the protein.
@chofski @jakebeal I highly prefer the non-teeth version for RNA. Teeth has way too high stroke complexity.
Also, if I understand correctly, we should be careful to call it "DNA encoding ncRNA" -- in that this glyph cannot be used to depict the RNA transcribed. (Or if it can, then that should be mentioned in the spec.) I also prefer the boxed variant for ncRNA since it clearly shows it as being part of the DNA.
As to chevron -- I understand that the problem can be solved, but I don't support adding a known confusion into the spec. The overlap problem needs to be dealt with at some point, and it is best if we don't make it worse.
@graik I concur that we need to develop a standard way to describe overlapping features. I agree, the PD would be better illustrated if it could be aligned beside the CDS.
@jakebeal please note this too:
Tag: I object to the current glyph because:
Also, if I understand correctly, we should be careful to call it "DNA encoding ncRNA" -- in that this glyph cannot be used to depict the RNA transcribed.
@swapnilb I believe the associated SO term is clear: "SO:0001263: Non-Coding RNA Gene", so I've changed the glyph name in the SEP to be exactly that.
@graik I'm a bit confused by your proposal: it sounds like you're thinking this glyph is describing part of an actual protein, like in our ACS SynBio paper? That is not the case here: per the SO term, this glyph is intended to describe a portion of a CDS.
OK, that wasn't clear for me. I thought the same symbol would end up being used for protein sequence annotation as well. If we restrict ourself to DNA only, then the perhaps basic problem is how to have both the CDS symbol and overlapping RNA or protein symbols coexist. For example, I like the "Tag" symbol but how is it going to overlay on a CDS? Inside of it? Above it? Or should there be no CDS symbol if details are shown for the ORF?
Pragmatically, I think @swapnilb 's 2-way fusion CDS shown above is the most straightforward and intuitive to demarcate sub-elements in a ORF including domains.
@graik I agree that composition is a critical requirement of CDS "sub-components." We do not have a formal composition model, nor is this SEP the place to develop one (though I would encourage development of that in a new SEP).
I thus think that for CDS-related elements, we remain at the same place: showing composition by the "fusion" method used by @swapnilb, either with straight boundaries ("User Defined") or with angled boundaries (chevron).
Furthermore, given the serious problems that Tag has with composition and SO term, and the fact that these do not seem to have any promising paths to resolution, I have moved Tag to the "consensus against" set --- we would thus recommend that protein tags be indicated with Protein Domain glyphs, which I think is reasonable. I have also moved the ncRNA gene "squiggle with teeth" down since it had only mild support but strong opposition.
Would anybody like to speak about Homology Region, polyA site, or Non Directional Sticky End?
If I don't hear more significant backing for these, I am going to move them to "these don't seem important enough to anybody to add at this time."
I’m not locked into a particular glyph for it, but I believe we need one for polyA, since this is common in the iGEM dataset. Three As is fine by me, but I’m open to alternative suggestions.
On Sep 2, 2017, at 12:42 PM, Jacob Beal notifications@github.com wrote:
Would anybody like to speak about Homology Region, polyA site, or Non Directional Sticky End?
If I don't hear more significant backing for these, I am going to move them to "these don't seem important enough to anybody to add at this time."
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-326765491, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD943XE3ZijJLSmPKYFP9Hr7LSqzpHks5sea-zgaJpZM4O_B9B.
Updated again based on the current state of discussion. I think we may be nearly ready to move forward for a vote. Here is what I am currently seeing:
Symbols (maybe) ready for voting:
Symbols without sufficient backing, and thus not being voted on (they may be revisited in the future): Codon, Homology Region, Inverter, Non Directional Sticky End, Tag
We still need to discuss Non-Coding RNA and Mature Transcript Region. I believe that Mature Transcript Region should be discarded, since the SO term covers specifically transcripts, and not sequences coding for functional transcripts (which is handled neatly by ncRNA). I would then propose we move forward with a vote to decide which of the two ncRNA glyphs should be retained.
What do people think of this proposal?
ncRNA in SO (http://www.sequenceontology.org/browser/current_svn/term/SO:0000655) seems to describe an RNA and not a region of DNA that represents a non-coding RNA region. This is why we used mature transcript region (http://www.sequenceontology.org/browser/current_svn/term/SO:0000834). This is the parent term that includes mRNA, which includes CDS, ribosome entry site, etc. Granted mature transcript region is too general, but if you don't know what type of non-coding RNA it codes for then there is no good parent term to use. The solution here might be that we need to contact SO folks.
@cjmyers That is why the current proposal does not use that term. SO:0000834, in fact, has the same problem.
Instead, the current proposal uses Non-Coding RNA Gene (http://www.sequenceontology.org/browser/current_svn/term/SO:0001263), which is for a region of DNA that represents a non-coding RNA region.
Ah, did not notice that. That is a better definition, but it makes it less parallel with CDS. I would have expected these terms to be cousins.
I take back what I said about SO:0000834, which is fine. I was getting is confused with http://www.sequenceontology.org/browser/current_svn/term/SO:0000233 --- the parallel terminology in very differently structured parts of the ontology is frustrating to me sometimes.
I would be comfortable to have both SO:0001263 and SO:0000834 be legitimate SO terms for this. We are, in fact, allowed multiple terms, and have used it before (e.g., Ribosome Entry Site is given both SO:0000139 and SO:0000204).
I think we might want to see if we can get the SO folks to add a parent to the RNA coding regions of the DNA that are not mRNAs. In other words a child of 834 that is a parent to all but 836.
That would still be problematic because 836 contains things like "riboswitch" and "RNA thermometer" that are functional RNA rather than protein coding RNA.
These though are still part of the mRNA to regulate its translation into a protein though? I thought the distinction we are wanting is a glyph that indicates that the region codes for protein or does not code for protein. My understanding is that mRNA codes for protein, and the other RNA options here do not, even if parts of the mRNA region are there only for regulation of translation.
I believe that things like riboswitches might be used to regulate other functional RNA as well, by modulating its stability. I am not certain, but certainly wouldn't count on artificial systems remaining isolated thus.
For now, I have merged the two, giving both SO terms.
Returning to the question of protein domains: I would like to remove the "rectangle" option. My reasons are:
@swapnilb @graik @cjmyers Would you be OK with this?
I'd also like to pick just one of the three ncRNA options, if we can. Would people be OK with the recommended vote being: "pick one" rather than "do you like all three as alternatives"? And can we eliminate any before voting?
I agree.
On Sep 17, 2017, at 6:13 AM, Jacob Beal notifications@github.com wrote:
Returning to the question of protein domains: I would like to remove the "rectangle" option. My reasons are:
It is redundant with the use of rectangle for Unspecified. Thus, if we want "rectangle", we should just not assign a glyph. At the same time, it will conflict with the use of rectangle in Composite and (likely) No Glyph Assigned @swapnilb https://github.com/swapnilb @graik https://github.com/graik @cjmyers https://github.com/cjmyers Would you be OK with this?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-330039596, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD90YCc9i9lX7ldzSDBVgbyYVlG_-dks5sjQzqgaJpZM4O_B9B.
Ok with pick one. Also okay to eliminate one. I’m not partial to any particular one.
On Sep 17, 2017, at 6:14 AM, Jacob Beal notifications@github.com wrote:
I'd also like to pick just one of the three ncRNA options, if we can. Would people be OK with the recommended vote being: "pick one" rather than "do you like all three as alternatives"? And can we eliminate any before voting?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-330039637, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD92r3greD0GAPE2Jy20CSKNGdckZTks5sjQ0jgaJpZM4O_B9B.
On ncRNA, a bit of searching around the literature online finds that the main ways of diagramming ncRNA at present are:
I thus propose to remove the "peeling teeth" ncRNA from consideration, then put "wiggle" vs. "box-wiggle" to a vote, since I know that I strongly prefer "wiggle" and @swapnilb has stated that he prefers "box-wiggle".
Sounds good to me.
On Sep 17, 2017, at 12:32 PM, Jacob Beal notifications@github.com wrote:
On ncRNA, a bit of searching around the literature online finds that the main ways of diagramming ncRNA at present are:
"single strand wiggles" unspecified rectangles complex shape diagrams I thus propose to remove the "peeling teeth" ncRNA from consideration, then put "wiggle" vs. "box-wiggle" to a vote, since I know that I strongly prefer "wiggle" and @swapnilb https://github.com/swapnilb has stated that he prefers "box-wiggle".
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOLv-realizations/issues/8#issuecomment-330068484, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD96VUhc6Nk45STDFBOmXd6fc5U3Egks5sjWXPgaJpZM4O_B9B.
I strongly prefer "box-wiggle." To reiterate: this is the DNA encoding some ncRNA. So I like that it is attached to the backbone and NOT hovering.
I think we should punt on protein domains. It requires more careful design. So I am OK with removing the rectangle option at the very least.
For the protein domain chevron, I am still NOT OK with it. It's known to not be, and by design, is not a good glyph. Therefore, I propose that we have another symbol such that none of its X-directional overlapping translations forms an unintended glyph. Any glyph that smoothly varies in the X/-X direction would do the job, and would improve the visualization of overlapping domains, which is likely a common use case.
SEP V004: New Glyph Collection
Abstract
A number of new glyphs have been proposed over the past few years, and we need to put them to an up-or-down vote.
There are twelve proposals currently pending: Aptamer, Codon, Homology Region, Inverter, Non-Coding RNA, ORI-T, polyA Site, Protein Domain, Specific Recombination Site, Non Directional Sticky End, Tag, Transcript Region
Table of Contents
1. Rationale
Each glyph detailed below in its specification has been provided with an individual rationale for that glyph. Examples are also embedded within each proposal.
2. Specification
Aptamer
Associated SO term(s)
SO:0000031: Aptamer
Recommended Glyph and Alternates
The proposed aptamer glyph is a cartoon diagram of nucleic acid secondary structure like that found in aptamers:
Prototypical Example
theophylline aptamer
Non-Coding RNA Gene
Associated SO term(s)
SO:0001263: Non-Coding RNA Gene SO:0000834: Mature Transcript Region
Recommended Glyph and Alternates
Two of the proposed non-coding RNA glyphs are both single-stranded RNA "wiggles," one on top of a box:
another hovering above the backbone:
One or the other of these should be chosen, but not both.
Prototypical Example
gRNA
ORI-T
Associated SO term(s)
SO:0000724: Origin of Transfer
Recommended Glyph and Alternates
The origin of transfer glyph is circular like Origin of Replication, but also includes an outbound arrow:
Prototypical Example
oriT
Notes
The recommended backbone location of Origin of Replication is not yet fixed; the backbone location of this glyph is intended to match Origin of Replication, so it that is recommended to become below the glyph, this backbone location will shift as well.
polyA site
Associated SO term(s)
SO:0000553: polyA Site
Recommended Glyph and Alternates
The polyA site glyph is a sequence of As sitting atop the backbone:
Prototypical Example
polyA tail on mammalian coding sequence
Specific Recombination Site
Associated SO term(s)
SO:0000299: Specific Recombination Site
Recommended Glyph and Alternates
The specific recombination site glyph is a triangle, centered on the backbone, as has appeared in a number of recombinase circuit papers:
Prototypical Example
flippase recognition target (FRT) site
Notes
Potential conflict with proposed Inverter glyph.
3. Examples
See examples in individual glyph proposals.
4. Backwards Compatibility
All proposals are for new glyphs that do not conflict with existing glyphs. Note that two proposals (Inverter and Recombinase Site) do conflict with one another.
5. Discussion
The following proposed options have been considered, but do not have strong support and are thus being removed from consideration unless they pick up significant advocacy. They may be revisited in the future.
Aptamer
Codon
Associated SO term(s)
SO:0000360: Codon
SO:0000318: Start Codon
SO:0000319: Stop Codon
Recommended Glyph and Alternates
The proposed aptamer glyphs are two versions of a cartoon diagram of nucleic acid secondary structure like that found in aptamers:
Nucleotides can be indicated with colors or letters in the boxes:
Proteins can be indicated by a letter above:
Stop and start codons might be indicated by special symbols:
Edits can be indicated by changes:
Prototypical Example
UGA stop codon
Notes
If accepted, there will need to be additional work done to elaborate the full specification.
Homology Region
Associated SO term(s)
SO:0000853
Recommended Glyph and Alternates
The homology region glyph is a stretched hexagon hovering above the backbone:
Prototypical Example
Needs a good example
Inverter
Associated SO term(s)
No SO term currently exists
Recommended Glyph and Alternates
The inverter glyph is a triangle, echoing the buffer glyph from electronics. It might be either above or on the backbone.
Prototypical Example
Needs a good example
Notes
Potential conflict with proposed Specific Recombination Site glyph.
Non-Coding RNA
Squiggle with teeth:
Peeling comb suggesting an RNA sequence partially attached to the backbone:
Non Directional Sticky End
Associated SO term(s)
SO:0001692 (unspecified direction)
Recommended Glyph and Alternates
A sticky restriction site of unspecified direction is an angled set of cuts:
Prototypical Example
EcoRI restriction site.
ORI-T
Spirals outward toward a new destination rather than being a closed circle. Two slightly different variants of spiral are proposed for consideration:
Protein Domain
Associated SO term(s)
SO:0000417 Polypeptide Domain
Recommended Glyph and Alternates
A number of proposals have been made for Protein Domain glyphs. These are:
Prototypical Example
VP64 activation domain
Notes
Protein domain should have the same recommended vertical position as CDS, but CDS does not have a recommended vertical position yet, so these proposals do not either.
Tag
Associated SO term(s)
SO:0000324: Tag
Recommended Glyph and Alternates
The tag glyph is a diagonal rectangle with clipped corners, reminiscent of a stereotypical paper gift tag:
Prototypical Example
PEST tag
Copyright
To the extent possible under law, SBOL developers has waived all copyright and related or neighboring rights to SEP V004. This work is published from: United States.