SynBioDex / SBOL-visual

The reference implementation of the SBOL Visual standard
Other
31 stars 15 forks source link

Representation of introns in non-CDS regions #168

Open fxbuson opened 8 months ago

fxbuson commented 8 months ago

Currently the specification represents introns by showing a "torn out" CDS with whatever is in between those edges being the intron composition (1A). While this works for introns within CDSs, introns in untranslated regions can't have the torn out edges, and have no specific visual representation assigned to them. A standard-compliant solution would be to use the more generic non-coding RNA (1B) or engineered region glyphs. If we don't want to change/add any glyphs, I would suggest to change the specification to convey both cases and not directly associate Intron SO:0000188 to the "torn edges" glyph.

image

If we do want to have a new way of representing introns, I would like to start the discussion with the iGEM Mammalian Genetic Design page, where an intron is represented as a "spike" (2). This alternative would be sufficient to represent introns outside of CDSs, and we could keep the torn edges for when CDSs are interrupted (3):

image

image

Also, if an intron encompasses a region that has functional elements (promoters, RNA elements, etc), we have to define if that should be represented simply as a "composite" intron (4A) or if there is a single-diagram solution, such as having the intron make its own 'intron strand' (4B,C).

image

Gonza10V commented 8 months ago

I see the need for new glyphs on intron regions to have a glyph outside the CDS. I liked the spike alternative and as you shoow in 4A. Now to show more detail I would use 4A with an inset of the composite before 4B or 4C.

jakebeal commented 8 months ago

I guess I am confused here. I thought that an intron had to be between exons. Can you give an example of an intron that isn't between exon sequences in a CDS?

fxbuson commented 8 months ago

@jakebeal an intron does need to be between two exons, but those don't need to be coding regions. Some of the parts in the collections I'm working with are 5'UTRs with introns in them, before any translation start site.

Gonza10V commented 8 months ago

Hmm I looked at the definition at the Sequence Ontology and intron is defined as a sequence in between two exons and as @fxbuson mentioned these dont need to be the CDS. I have no experience in synbio with eukaryotes, but I see in the example from iGEM tech mammalian the use of an intron outside of the CDS depicting RNA maturation, and one paper where introns are described in UTR regions.

jakebeal commented 8 months ago

My biology may be weak here... I thought that an exon was by definition part of the CDS?

fxbuson commented 8 months ago

Here is another example where an intron is not in the CDS. This is a part's plasmid in the OpenPlant toolkit. This part only has a promoter and a 5'UTR, so no coding region. Still, the UTR has an intron (highlighted).

image

Exons by definition are regions that get to be part of the mature mRNA, but are not necessarily coding regions.

jakebeal commented 8 months ago

Thank you for the example. That also caused me to look up in SequenceOntology and find that SO:exon is indeed the more general notion, while SO:coding_exon is what I was thinking about.

Gonza10V commented 8 months ago

So, now the implementation only allows the use of coding exons while it has no way to represent the general exon. I liked the solution from Asimov for this problem. Then, to represent composites I would go with a picture-in-picture solution. The representations of composites is a shared issue also mentioned for proteins #167 and maybe we should come with a stardized way to represent 2 hierarchical levels in a picture to solve both problems instead of creating a new representations for each composed part.

fxbuson commented 7 months ago

I'd agree to restrain the scope of this issue to the spike glyph solution. What is the process to turn this into an SEP?

jakebeal commented 7 months ago

@fxbuson : the process to create an SEP is that you do four things:

  1. Create a branch/fork with the proposed specification change. If the SEP is approved, this will then become a pull request and merged into the spec. Example of this for SEP V022
  2. Add an SEP into the SEPs directory. I find it works most smoothly when this explicitly includes the language from the specification diff, which is why I recommend making the SEP based on the change rather than the other way around Example of this for SEP V022
  3. Create an SEP discussion issue and link to the SEP from there. Example of this for SEP V022
  4. Announce the SEP and discussion issue to the mailing list and the Slack. Example of this for SEP V022

From there, we see if the SEP can reach a state of consensus, and then proceed to a vote!