Closed mikebissell closed 5 years ago
While we're at it, Jake has proposed allowing one SA to point to multiple Sequences, assuming they're all the same length.
Chris suggested this was a way to show multiple codon optimizations of the same sequence.
Though I believe others debated whether or not this should simply be two ComponentDefinitions, since different optimizations may have quite different behavior.
On Oct 14, 2015, at 6:10 PM, mikebissell notifications@github.com wrote:
While we're at it, Jake has proposed allowing one SA to point to multiple Sequences, assuming they're all the same length.
Chris suggested this was a way to show multiple codon optimizations of the same sequence.
— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/32#issuecomment-148238020.
Associating different DNA sequences to one ComponentDefinition just because they are assumed to be more or less the same is a bad idea. There is at least one (extreme) example where a single synonymous codon change causes a protein to adopt a completely unrelated fold/structure even though the amino acid sequence is identical. There is many examples of "silent mutations" having important effect on protein function. Example: http://m.sciencemag.org/content/315/5811/525.short
AA and DNA should not be mixed in one CD either. This is even further increasing multiplicity. We have A protein component class for exactly that. If sbol makes it unnecessarily difficult to state "this DNA encodes for that protein sequence" then this needs to be addressed in a separate issue.
No objection to the original SA -> sequence issue.
My hunch: asserting that two sequences are supposedly equivalent codon optimizations of each other is a functional assertion, and it does not belong in the structural layer.
This one should be easy, if a tiny bit tricky to justify. Chris, Goksel, and Nick had something to say on this topic... who else? Matthew, of course, will want to review it.
If there are multiple codon optimizations of the same sequence, then you have a pipeline like this:
originalsequence --(codon optmization) -> optimised{1 .. n}
You should create n sequences, one for each of the optimized versions. You
should really also create a sequence for the original. Then you can link
them using our proposed provenance extension, asserting that a codon
optimization agent used the source material
We could make this one of the worked examples or the provenance extension if you would like.
Matthew
On 16 March 2016 at 18:18, mikebissell notifications@github.com wrote:
This one should be easy, if a tiny bit tricky to justify. Chris, Goksel, and Nick had something to say on this topic... who else? Matthew, of course, will want to review it.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/32#issuecomment-197467180
Dr Matthew Pocock Turing ate my hamster LTD mailto: turingatemyhamster@gmail.com
Integrative Bioinformatics Group, School of Computing Science, Newcastle University mailto: matthew.pocock@ncl.ac.uk
gchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozer skype: matthew.pocock tel: (0191) 2566550 mob: +447535664143
I propose adding a best practice that a CD should have no more than one sequence, to be enforced in SBOL3.
Please create an SEP for this proposal.
On Jun 19, 2018, at 11:59 AM, James Alastair McLaughlin notifications@github.com wrote:
I propose adding a best practice that a CD should have no more than one sequence, to be enforced in SBOL3.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/32#issuecomment-398359420, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD92fZhQKWxS9yGQuw18Xth2c_8bReks5t-Nl4gaJpZM4GHsSZ.
addressed by https://github.com/SynBioDex/SEPs/issues/59
SBOL2-structural (section 7.7) permits us to associate one ComponentDefinition with multiple sequences (e.g DNA and AA; see issue #25).
The various Location subclasses (section 7.7.5) specify offsets within a sequence in the parent ComponentDefinition. However, there is no Sequence pointer in any of the the Location records, and so it is impossible to tell which Sequence in a ComponentDefinition a Location pertains to, in the event that a ComponentDefinition contains multiple Sequences.
We really should have caught this before ratifying SBOL2, because this appears to be a bug in the spec, not simply an opportunity for improvement. Apologies.