Promoter, terminator, non-coding parts.

TimothyStiles commented 2 years ago

Is your feature request related to a problem? Please describe. Isaac said we should try this. More details to come when I'm not tired. -Tim

TimothyStiles commented 2 years ago

Designing ORIs with fixes for weird repetitions.

isaacguerreir commented 2 years ago

One of the limiting factors for the design of DNA sequences is the synthesis technology. Right now some limitations make some sequences impossible to be created from scratch. Poly has a feature to improve CDS content, allowing the change of some codons to synonymous codons in order to ensure sequences that don't have problems synthesizing, assembly, and cloning.

This is only possible because the interchangeability of synonymous codons doesn't change the function (different CDSs, same protein sequence). The same strategy is not possible outside the CDS space. Promoters, RBSs, and Terminators are all basic parts that could have huge drawbacks and risks if even one base pair is changed. At the same time is not uncommon to have a high repetition subsequence inside a cassette that makes the sequence not synthesizable.

Synthetic Biologists have some strategies that could resolve this problem.

One of them is simply not synthesizing and amplifying the part from another plasmid, organism, or part. This could be easy or really difficult, depending on how easy access the user has to the 'source material', custom primers, and resources for the amplification process.

Another strategy is to break a sequence into two or more parts, exactly in the middle of the problematic part. These new sub-sequences should be synthesizable and have a clear strategy to assemble them together again. @eyesmo also has this idea of using 'synthetic introns' that could be added in between a problematic part, so you don't need to divide the sequence.

I think this has a value for synthetic biology, which still relies on manual design to resolve most of these problems, most of the time is not the optimal way that leads to failed experiments (not because of the design, but by trying to resolve the synthesizability problem of the sequence).

I will not pretend to speak for the community. So I would like to ask:

Is this valuable and worthy to pursue?
Should Poly have functions to resolve the problem?
Which other strategies could be used to resolve this problem?

eyesmo commented 2 years ago

Some functions that would be nice to have in this area:

The ability to specify sets of features within a designed sequence, including the start and end locations of the features and their position on either the top or bottom strand, along with optional sequence constraints for each feature (ranging from 'protein coding sequence' to 'NNNNNGYARTTTCANNANCNN' to '.....[..[]].....[[[[]]]]..' to 'No Mutations Allowed' or something). Poly already has this for synonymous recoding of CDSs, but being able to add specific constraints for non-coding sequences would be great.
The ability to incorporate the different sequence constraints for each feature into the sequence cleaning/fixing process.
If a sequence cleaning attempt fails to satisfy predicted synthesizability and feature sequence constraints after a specified number of steps, the options to try both inserting synthetic cloning introns (SynCloTrons) into the loci that are causing predicted synthesizability problems; and to try splitting up the sequence at the problematic loci, to be assembled via Golden Gate after synthesis.

If all of the above were incorporated, I believe it would make Poly even better than DnaChisel at optimizing sequences given constraints.

rkrishnasanka commented 2 years ago

Well, poly should be synthesis technology agnostic. This kind of goes to the PART level API I was talking about back in the stone age for poly. The feasibility of these parts is going to be a whole different thing.

isaacguerreir commented 2 years ago

Could you elaborate more about @rkrishnasanka?

rkrishnasanka commented 2 years ago

The premise was that we have different levels of API:

Level 1 - Sequence Level: this would include all operations for cutting, copying, pasting, etc. for sequences.

Level 2 - Part Level: This would include instantiating, deleting, modifying "Parts" that could include promoters, codons, repressors, terminators, coding regions, etc. Basically supports the entire SBOL nomenclature.

Level 3 - Circuit Level: This would have API for putting together, merging circuits, allow one to explore combinatorial spaces, etc. with their own algorithms. This would also mean we think of assembly as hooks that transform parts to generate circuits.

Level 4 - Pathway Level: This would be very much relevant for the kind of metabolic pathway construction, etc. I'm not sure what's the most ideal way of balancing the abstractions between circuits and pathways. In principle it's a really just a massive circuit where we might or might not know what all the interactions are.

Level 5 - Genome Level: I'm still not sure what this API will consist for CRUD like operations. I'm guessing the point of this would be to build sort of an sparse interaction model that we can keep updating.

@eyesmo @TimothyStiles @isaacguerreir I'm curious to hear your thoughts are. Obviously, my knowledge of synbio is limited.

Koeng101 commented 2 years ago

We don't support SBOL right now, and I think for pretty good reasons.

We have support where it makes sense on the part level (ie, codon optimization and fixing). Promoters, terminators, repressors, etc can't be modeled well in the kind of software that poly currently supports.

IMO circuits don't really work in synthetic biology. Barry Canton+Endy made some great strides there, but there is a reason why although parts are commonly re-used now when building biological devices, circuits (other than very simple circuits) aren't reused. They have too much dependency on underlying implementation, ie, the lumped-element model (which is necessary for building upon circuits) doesn't work in biology, unlike how it does work in electrical engineering. If you perhaps consider circuits as defined outcomes rather than sequences, it can work, however (that is the idea behind allbase).

Pathway level is allbase. Though it doesn't work on integrating massive circuits because, again, circuits don't really work in biology very well.

Nobody can really build genomes right now. Until we can, I don't think it is very useful to have a model for them.

rkrishnasanka commented 2 years ago

@Koeng101 fair enough:

My reference to SBOL is to support the different types of entities that you might come across in SBOL (https://sbolstandard.org/visual-glyphs/) and not the I/O to SBOL. I only brought it up because you all are talking about working with those parts.

From the sound of it, it seems like poly will be having Level 1, Level 2, and allbase will be having Level 3 and above. It would honestly be better if the parts would be agnostic to the assembly methods. That way, we can extend the base model with different assembly strategies.

I need to check on allbase to make sure I'm understanding what you mean first but I think we might just be talking past each other when we are talking about circuits.

bebop / poly

Promoter, terminator, non-coding parts. #229