Open fxbuson opened 8 months ago
I see the need for new glyphs on intron regions to have a glyph outside the CDS. I liked the spike alternative and as you shoow in 4A. Now to show more detail I would use 4A with an inset of the composite before 4B or 4C.
I guess I am confused here. I thought that an intron had to be between exons. Can you give an example of an intron that isn't between exon sequences in a CDS?
@jakebeal an intron does need to be between two exons, but those don't need to be coding regions. Some of the parts in the collections I'm working with are 5'UTRs with introns in them, before any translation start site.
Hmm I looked at the definition at the Sequence Ontology and intron is defined as a sequence in between two exons and as @fxbuson mentioned these dont need to be the CDS. I have no experience in synbio with eukaryotes, but I see in the example from iGEM tech mammalian the use of an intron outside of the CDS depicting RNA maturation, and one paper where introns are described in UTR regions.
My biology may be weak here... I thought that an exon was by definition part of the CDS?
Here is another example where an intron is not in the CDS. This is a part's plasmid in the OpenPlant toolkit. This part only has a promoter and a 5'UTR, so no coding region. Still, the UTR has an intron (highlighted).
Exons by definition are regions that get to be part of the mature mRNA, but are not necessarily coding regions.
Thank you for the example. That also caused me to look up in SequenceOntology and find that SO:exon is indeed the more general notion, while SO:coding_exon is what I was thinking about.
So, now the implementation only allows the use of coding exons while it has no way to represent the general exon. I liked the solution from Asimov for this problem. Then, to represent composites I would go with a picture-in-picture solution. The representations of composites is a shared issue also mentioned for proteins #167 and maybe we should come with a stardized way to represent 2 hierarchical levels in a picture to solve both problems instead of creating a new representations for each composed part.
I'd agree to restrain the scope of this issue to the spike glyph solution. What is the process to turn this into an SEP?
@fxbuson : the process to create an SEP is that you do four things:
From there, we see if the SEP can reach a state of consensus, and then proceed to a vote!
Currently the specification represents introns by showing a "torn out" CDS with whatever is in between those edges being the intron composition (1A). While this works for introns within CDSs, introns in untranslated regions can't have the torn out edges, and have no specific visual representation assigned to them. A standard-compliant solution would be to use the more generic non-coding RNA (1B) or engineered region glyphs. If we don't want to change/add any glyphs, I would suggest to change the specification to convey both cases and not directly associate Intron SO:0000188 to the "torn edges" glyph.
If we do want to have a new way of representing introns, I would like to start the discussion with the iGEM Mammalian Genetic Design page, where an intron is represented as a "spike" (2). This alternative would be sufficient to represent introns outside of CDSs, and we could keep the torn edges for when CDSs are interrupted (3):
Also, if an intron encompasses a region that has functional elements (promoters, RNA elements, etc), we have to define if that should be represented simply as a "composite" intron (4A) or if there is a single-diagram solution, such as having the intron make its own 'intron strand' (4B,C).