SynBioDex / SBOL-specification

The Synthetic Biology Open Language (SBOL)
http://sbolstandard.org
14 stars 9 forks source link

Locations fail to unambiguously specify their related Sequence (if multiple Sequences in a ComponentDefinition). #32

Closed mikebissell closed 5 years ago

mikebissell commented 8 years ago

SBOL2-structural (section 7.7) permits us to associate one ComponentDefinition with multiple sequences (e.g DNA and AA; see issue #25).

The various Location subclasses (section 7.7.5) specify offsets within a sequence in the parent ComponentDefinition. However, there is no Sequence pointer in any of the the Location records, and so it is impossible to tell which Sequence in a ComponentDefinition a Location pertains to, in the event that a ComponentDefinition contains multiple Sequences.

We really should have caught this before ratifying SBOL2, because this appears to be a bug in the spec, not simply an opportunity for improvement. Apologies.

mikebissell commented 8 years ago

While we're at it, Jake has proposed allowing one SA to point to multiple Sequences, assuming they're all the same length.

Chris suggested this was a way to show multiple codon optimizations of the same sequence.

cjmyers commented 8 years ago

Though I believe others debated whether or not this should simply be two ComponentDefinitions, since different optimizations may have quite different behavior.

On Oct 14, 2015, at 6:10 PM, mikebissell notifications@github.com wrote:

While we're at it, Jake has proposed allowing one SA to point to multiple Sequences, assuming they're all the same length.

Chris suggested this was a way to show multiple codon optimizations of the same sequence.

— Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/32#issuecomment-148238020.

graik commented 8 years ago

Associating different DNA sequences to one ComponentDefinition just because they are assumed to be more or less the same is a bad idea. There is at least one (extreme) example where a single synonymous codon change causes a protein to adopt a completely unrelated fold/structure even though the amino acid sequence is identical. There is many examples of "silent mutations" having important effect on protein function. Example: http://m.sciencemag.org/content/315/5811/525.short

AA and DNA should not be mixed in one CD either. This is even further increasing multiplicity. We have A protein component class for exactly that. If sbol makes it unnecessarily difficult to state "this DNA encodes for that protein sequence" then this needs to be addressed in a separate issue.

No objection to the original SA -> sequence issue.

mikebissell commented 8 years ago

My hunch: asserting that two sequences are supposedly equivalent codon optimizations of each other is a functional assertion, and it does not belong in the structural layer.

mikebissell commented 8 years ago

This one should be easy, if a tiny bit tricky to justify. Chris, Goksel, and Nick had something to say on this topic... who else? Matthew, of course, will want to review it.

drdozer commented 8 years ago

If there are multiple codon optimizations of the same sequence, then you have a pipeline like this:

originalsequence --(codon optmization) -> optimised{1 .. n}

You should create n sequences, one for each of the optimized versions. You should really also create a sequence for the original. Then you can link them using our proposed provenance extension, asserting that a codon optimization agent used the source material and produced each of the resulting optimized sequences.

We could make this one of the worked examples or the provenance extension if you would like.

Matthew

On 16 March 2016 at 18:18, mikebissell notifications@github.com wrote:

This one should be easy, if a tiny bit tricky to justify. Chris, Goksel, and Nick had something to say on this topic... who else? Matthew, of course, will want to review it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/32#issuecomment-197467180

Dr Matthew Pocock Turing ate my hamster LTD mailto: turingatemyhamster@gmail.com

Integrative Bioinformatics Group, School of Computing Science, Newcastle University mailto: matthew.pocock@ncl.ac.uk

gchat: turingatemyhamster@gmail.com msn: matthew_pocock@yahoo.co.uk irc.freenode.net: drdozer skype: matthew.pocock tel: (0191) 2566550 mob: +447535664143

jamesamcl commented 6 years ago

I propose adding a best practice that a CD should have no more than one sequence, to be enforced in SBOL3.

cjmyers commented 6 years ago

Please create an SEP for this proposal.

On Jun 19, 2018, at 11:59 AM, James Alastair McLaughlin notifications@github.com wrote:

I propose adding a best practice that a CD should have no more than one sequence, to be enforced in SBOL3.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SynBioDex/SBOL-specification/issues/32#issuecomment-398359420, or mute the thread https://github.com/notifications/unsubscribe-auth/ADWD92fZhQKWxS9yGQuw18Xth2c_8bReks5t-Nl4gaJpZM4GHsSZ.

palchicz commented 5 years ago

addressed by https://github.com/SynBioDex/SEPs/issues/59