Add FragmentWithOverhangs

Koeng101 commented 10 months ago

Changes in this PR

I'm porting over some changes I have in my private fragmenter. This constrains the possible overhangs in the fragmenter to only ones given by the user.

Why are you making these changes?

They are useful in more advanced cloning methods.

Are any changes breaking? (IMPORTANT)

No breaking changes

Pre-merge checklist

All of these must be satisfied before this PR is considered ready for merging. Mergeable PRs will be prioritized for review.

[x] New packages/exported functions have docstrings.
[x] New/changed functionality is thoroughly tested.
[x] New/changed functionality has a function giving an example of its usage in the associated test file. See primers/primers_test.go for what this might look like.
[x] Changes are documented in CHANGELOG.md in the [Unreleased] section.
[x] All code is properly formatted and linted.
[x] The PR template is filled out.

cachemoi commented 9 months ago

They are useful in more advanced cloning methods.

out of curiosity which ones?

Koeng101 commented 9 months ago

They are useful in more advanced cloning methods.

out of curiosity which ones?

Here is quoted part of my technology memo for my company, Nanala, on what I'm planning on doing with the DNA. The basic idea is that you can just slice everything into 250bp bricks, clone em real quick, and then combine the ones that seem to be reused the most often - but this is only possible if you have the correct linker sequences (kinda like these but self referential). Making 256 of them is impractical (for me), so I only wanna build 96 of them for both sides, resulting in 192 total linkers available for use - or perhaps I'll do 48 on both sides (total 96), still thinking about it.

With those linkers, you can choose during-cloning (using only different DNA fragments) how many blocks to combine at once, rather than having to choose beforehand. This makes caching a lot easier. For example, if you start with a bunch of 250mers, and you find that the block 394 and block 293 are very commonly used together, you can then cache that combination (394+293). This cache keeps growing, until you start caching things like whole vectors (which are reused very often). New proteins you gotta do de-novo, but if you are just changing regulation, you can just swap the promoters or such.

Of course, I'm still pretty bull-ish on using genetic parts instead, but eh if someone just wants one specific DNA sequence without using parts, who am I to judge? Still uses all the same system I use for the genetic parts anyway.

Reusable synthesis products

When synthetic biologists think of reusing genetic parts, they typically think of a component-first approach, with a focus on the genetic function of each component (a promoter, a CDS, a terminator, etc).

When synthesis companies, like Twist or Genscript, think about the production of genetic sequences, they think in terms of plasmid backbones and inserts. From a production standpoint, this makes sense, until one wants to start making minor modifications to those backbones or variants of the inserts. The ability to make combinatorial variants is essential enough that many companies run internal GoldenGate assembly pipelines.

My approach to generic reusable synthesis products is not to make well defined genetic components (though I do plan on separately making genetic toolkits, more on that below) - nor is it to get plasmid backbones from customers (as Twist and Genscript do), since that takes significant user investment and forces me to validate outside products. It is to take advantage of the inherent historical structure of plasmid sequences.

When cloning began in the 60s, the only way to engineer sequences was to do restriction enzyme cloning, and so naturally plasmid “lineages” arose from commonly used plasmid backbones. As those backbones began working, people stopped building the backbones and instead focused on their unique genetic inserts.

The cheap synthesis process described above works by chunking ~250bp fragments from oligo pools, clonally verifying them, and then using them in further assemblies. Many vector backbones will share ~250bp fragment chunks because of their lineage. Many assemblies of these ~250bp fragments will also be shared. All these shared fragments can be reused to dramatically lower the costs to build full plasmids from scratch, as most backbones won’t even require fresh synthesis.

Nanala clones can be reused quickly without resynthesis, acting like a DNA cache. Every order adds a little bit to my DNA cache, which will become more and more valuable over time.

Reusable synthesis products: clone-time assembly optimization

There is one final technology I am taking advantage of: methylation based hierarchical cloning. Basically, there are methods where you only need to use a single restriction enzyme, BsaI, to clone in a hierarchical way (Ie, 36 fragments -> 6 fragments -> 1 fragment). There are 256 different possible overhangs for BsaI (4bp of N, so 4^4), and I plan to clone every single one for both edges between the cloning plasmid / genome integration sites and my customer’s gene of interest.

This allows us to do hierarchical assembly of a variable number of inserts, with zero limitations on how fragments can be connected together. I can assemble 2 oligos, 5 oligos, or 10 oligos into fragments at once. I can then take those fragments and assemble 2 fragments, 5 fragments, or 10 fragments at once. Importantly, I change nothing about the sequences themselves - decisions around how many fragments to assemble together are decided at clone-time.

While in the beginning, our cloning system may be inefficient, and can only add 3 fragments together, as I improve the system I can add more fragments together without changing the underlying DNA or method. Not only does this give us a clear path to optimization and improvement of the clone system, but I can also optimize the above assemblies on the basis of what creates the most reusable DNA fragments.

In essence, my fragment chunks act like small, modular, scarless bricks to build more complicated plasmids.

cachemoi commented 9 months ago

Thanks for the nice read! V. interesting. I'll give you a ✅ when you've expended on your comment here and done the rename mentionned here

bebop / poly