friendzymes / community

👋 The Friendzymes back-office. This is where the meta lives: admin, steering, and assorted org work.
MIT License
0 stars 0 forks source link

AllClo discussion #15

Open isaacguerreir opened 2 years ago

isaacguerreir commented 2 years ago

AllClo

Unification, harmonizations and validations of high fidelity MoClo genetic assembly standard expansions

Our objective is to create a standard expansion unifying all the previous ones (or at least harmonize much as possible). List of standards:

Obs: Need to validate each expansion of the assembly standard, and also BtgzI/BsaI-based part type switching (and/or methylation-based part type switching!) to go from a simplified assembly to a more complex assembly

isaacguerreir commented 2 years ago

Having a description of each standard will be useful to definy the unifying components

jcahill commented 2 years ago

First line of discussion: It is unclear to me without being provided with some reading that the theoretical grounding to do this exists as of now.

Second line of discussion: I believe generalization of tooling for formal proofs regarding various properties of assembly standards is in order, given the explosion of assembly standard variants and modular cloning flavors over the past few years. This would be a far more substantial undertaking than what I currently believe the AllClo plan to be, but I think it's worth pursuing.

jcahill commented 2 years ago

Isaac L on feasibility:

Yes! I think it can be done. A couple of things enable it:

  1. High fidelity Golden Gate overhang sets that enable assembly of large numbers of parts (opening the door to more part types than fit in the current assembly standards) https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0238592
  2. The ability to perform part type switching, either with BsaI/BtgzI paired overhangs or ideally with Keoni’s conditional methylation and linker strategy. This helps solve the proliferation problem of needing to define multiple versions of the same gene with different pairs of overhangs depending on how it is being used in a given assembly. You can run the same assembly reaction with a pair of linker parts from a standardized set, and the result is a change to the part type definition (and a small scar sequence). I know I haven’t described or diagrammed this method out for team members in a way that is easy to understand; it took me a while to see what Keoni was getting at with it. One of my main to-dos for the winter break is to make an explainer for this, and to start design on the part switching linker sets that would be most immediately useful for Friendzymes’ work
  3. Access to free or almost free large-scale gene synthesis, through some combination of FreeGenes, iGEM, and Keoni’s oligo arrays; and the fact that those genes will be freely available to others. There are three factors that I think are responsible for the proliferation of assembly standards: (1) the need to publish novelty in academia; (2) the difficulty of finding a standard that is both idempotent (uses the same reaction with the same enzyme set for building everything), flexible/universal (anything you reasonably want to build can be built), and fast/multi-part (multiple parts can be assembled in one reaction, reducing the need for hierarchical many-step assemblies); and (3) the time and cost of converting a pre-existing wetware library to a new assembly standard. There’s not much to do about (1), other than design and proselytize the obviously best assembly standard and make it easy to learn to use it; AllClo should hopefully address (2); FreeGenes, the OpenMTA and better wetware design workflows like Poly can help to address (3).
isaacguerreir commented 2 years ago

Hey people, what do you think about this strategy of work:

  1. Take one standard of the MoClo standards list
  2. Create a description with the restriction enzymes used
  3. Create a description with all the overhangs

After having this list we could try to find intersections and inconsistencies between standard overhands using the SplitSet website and also our SplitSet reverse engineered tool in case we need to test a lot of different sets of overhangs (so we could make this programmatically).

With this information in hand will be better and easier to develop a design of AllClo.

eyesmo commented 2 years ago

My notes on AllClo:

  1. AllClo should combine BacClo/VecClo, ProClo, FiveClo and ThreeClo into a single unified assembly standard, using a single high fidelity 4bp BsaI overhang set.

---Note up front: it will be rare for a genetic device to use all of these part types at once. Rather than requiring every assembly reaction to include a bunch of neutral 'spanner' parts as placeholders for unused part types, we should handle this with Part Type Switching (PTS) linkers. For example, by default the core promoter part will have the same overhangs as in uLoop. To build a more complex 5' UTR, an assembly reaction will be performed between the promoter and two PTS linkers, that can convert the promoter's overhang definition to a pair of FiveClo overhangs. In parallel, the set of FiveClo parts that one wishes to assemble and use will also undergo a PTS reaction with linkers to convert their overhangs to close any unused part slots. This way, while each promoter starts out with FiveClo-incompatible overhangs and each FiveClo part listed below starts with its own unique combination of two overhangs, a single round of PTS reactions later (followed by either transformation/cloning or PCR amplification of PTSed parts), all parts have the overhangs required for the desired assembly; and moreover these overhangs are compatible with any combination of VecClo, ProClo, and ThreeClo parts as well, up to and including one-pot full vector assembly reactions. This PTS design strategy can be applied to VecClo, FiveClo, ProClo, and ThreeClo, ensuring simple and backwards-compatible uLoop assembly by default, while enabling access to the full array of complex/composite/high fidelity AllClo genetic design and assembly after a single round of PTS reactions.

-- VecClo parts: ---3' Homology Arm (3HA). Used along with 5' homology arm for genomic integration. ---E. coli Selection Marker (EcoSel). Used to select for and maintain the vector in E. coli. Note: AllClo should offer the option to split the E. coli selection marker into two individually nonfunctional parts, EcoSel1 and EcoSel2. Keoni says doing this significantly reduces the false positive rate during colony picking. ---E. coli origin of replication (EcoOri). Required for propagating the vector in E. coli. ---'Packaging' slot (Pkg). Used for conjugative transfer of the vector from E. coli to another chassis cell type, as well as for packaging phagemids or other viral vectors. ---5' Homology Arm/Target Cell Origin of Replication (5HA). If a homology arm, used along with the 3' homology arm for targeted genomic integration into a non-E. coli chassis cell. If an origin of replication, used to replicate the vector in a non-E. coli chassis cell. ---Target Selection Marker (Tsel). Used to select for the vector's propagation/integration in a non-E. coli chassis cell.

-- FiveClo parts: ---5' linker for multi-TU assembly (5L). Used along with the 3' linker to do secondary assembly of multiple transcription units. Variants of this part type could also be combined with recombinase recognition sites for deleting transcription unit(s). ---5' recombinase binding site (5rec). Combine with 3' recombinase recognition (3rec) parts to delete one or more transcription units, or to integrate them into a pre-existing recombinase landing pad in a chromosome or other genetic device. ---Distal promoter element (dPromE). Important in some eukaryotic genetic devices for controlling and/or increasing gene expression. ---Core promoter (Prom). The core promoter element, controlling transcription. Different overhangs than in MoClo/uLoop (requires part type switching). ---Operator (Op). Binding site for a conditional/inducible transcription factor. One of the main mechanisms of transcriptional control in genetic circuits. ---5' recombinase recognition site, flipping variant (5recF). Combines with 3' recombinase recognition (3rec) parts to flip RBS+CDS pairs and turn expression on or off digitally. ---Ribozyme (Rbz). RNA element for controlling or regulating gene expression. Examples include ribozyme insulators that cleave off the the untranslated region of the mRNA 5' to the insulator, increasing the reliability/modularity of promoter and CDS parts by reducing the risk of changes to device behavior due to changes in 5'UTR folding; toehold switches that bind to and block the RBS until displaced by binding another complementary nucleic acid molecule, regulating expression at the level of translation; STARs that terminate transcription prior to gene expression, unless bound and blocked by a small RNA in trans, regulating expression at the level of transcription; and riboswitches that change conformation/cleavage activity (and therefore gene expression) in response to a small molecule, nucleic acid or protein binding event. ---Ribosome binding site (RBS). Required for prokaryotic translation initiation. The RBS (along with mRNA folding structure around the start codon) determines the rate of translation initiation. Given the sensitivity of this part to sequence modifications, it should ideally maintain the same overhangs as in uLoop, and not require any part type switching. ---Designate at least three additional 'free' FiveClo overhangs (along with the associated part type switching linkers), to enable people to add more part types to the 5' UTR (e.g. additional distal promoter elements, ribozyme elements, or operator elements).

-- ProClo2.0 parts: ---Ribosome binding site (RBS). Required for prokaryotic translation initiation. For use in ProClo (but not FiveClo), the RBS probably requires a different 3' overhang than is used for uLoop (AATG), since the core CDS (5' overhang AATG) is now located 3-4 parts away. Basically, special sets of RBSs should be used to pair with localization tags. Note: ProClo1.0 solved this problem by combining the RBS and Loc part types into a single RBS/Loc part (e.g. for the B. subtilis secretion tag library plasmids). I think they should be separated in ProClo2.0, but I'm open to being persuaded to keep this arrangement. ---RBS/Localization tag (Loc). An example use case is a secretion tag (or, as in ProClo 1.0, an RBS/secTag pair). ---N1 tag (N1). The first of 3 N-terminal CDS tags/fusions (not including the Loc tag). Example use cases include purification tags, and trans-splicing inteins for building multi-domain protein fusions post-translationally. ---N2 tag (N2). The second of 3 N-terminal CDS tags/fusions (not including the Loc tag). Example use cases include reporter tags (e.g. fuGFP or mCherry), or cleavage tags (e.g. inteins or protease recognition sites). ---N3 tags (N3). The third of 3 N-terminal CDS tags/fusions (not including the Loc tag). Example use cases include cleavage tags (e.g. inteins or protease recognition sites) and reporters (e.g. fuGFP or mCherry). ---Protein Coding Sequence (CDS). The core CDS, with the same overhangs as a CDS in a regular uLoop transcription unit (5'-AATG--AGGT-3'). ---C1 tag (C1). The first of 3 C-terminal CDS tags/fusions. Example use cases include cleavage tags (e.g. inteins or protease recognition sites) and reporters (e.g. fuGFP or mCherry). ---C2 tag (C2). The second of 3 C-terminal CDS tags/fusions. Example use cases include cleavage tags (e.g. inteins or protease recognition sites) and reporters (e.g. fuGFP or mCherry). ---C3 tag (C3). The third of 3 C-terminal CDS tags/fusions. Example use cases include purification tags, degradation tags, and trans-splicing inteins for building multi-domain protein fusions post-translationally. ---It might be useful later on to develop further expanded AllClo-compatible ProClo 2.0 overhang sets, that generate application-specific amino acid dyads in their 4-bp scar sequences. Examples where this could be useful include the construction of novel TALE proteins, and the construction of novel fibrous, multidomain proteins (e.g. silk fibroins).

-- ThreeClo parts: ---Terminator (Term). Terminates transcription of mRNA, by causing RNA polymerase to dissociate the DNA and the growing mRNA. ---3' recombinase recognition site (3rec). Used in combination with 5rec or 5recF parts, can delete transcription unit(s), integrate transcription unit(s) at recombinase landing pads, or flip coding sequences on and off. ---3' linker for multi-TU assembly (3L). Used along with the 5' linker to do secondary assembly of multiple transcription units. Variants of this part type could also be combined with recombinase recognition sites for deleting transcription unit(s). ---Designate at least three additional 'free' ThreeClo overhangs (along with the associated part type switching linkers), to enable people to add more part types to the 3' UTR (e.g. poly-A tails, microRNA binding/cleavage sites, and additional terminators on the top and bottom strand to insulate the device even when the RBS+CDS are flipped and the terminators are next to the promoter and 5' recombination sites).

1. There must be a balance between what one can do in one step (flatness), and simplicity. -- Where transcription units (TUs) can replace new part types, let them do so. They can always be part type switched later. -- Where part types are very diverse (have lots of variants that go in that slot), try to keep the overhangs the same as in MoClo (or uLoop, depending on which way iGEM is going for the new Distribution and Registry). Importantly, this was not done for ProClo v1.0 CDS parts (EDIT: actually I think the ProClo/Protein Expression Toolkit CDSs are uLoop compatible!). 2. At the moment, the only new part types I think we definitely need, part types that could go into expanded FiveClo and ThreeClo assemblies (on top of the MoClo, ProClo and BacClo/VecClo standard sets) are ribozyme insulator (Ins) parts and recombinase binding site (Rec) parts. 3. It is possible to place Rec sites on assembly connectors. It's not clear that this is preferable to defining a new part type, as new linker sets would need to be designed for each orthogonal Rec site. Might be worth designing a subset of linkers with Rec sites, in addition to defining the new part types. 4. In BacClo/VecClo (hereafter VecClo), the target selection marker (TSel) is 5' to the TU. Should Rec sites and counterselection markers (CTsels) be inserted there too? -- At minimum, this should be an option; potentially with a dropout cassette. -- Another option is to add a dummy spacer in the TSel slot, and insert the TSel downstream of the TU along with the Rec sites and CTsel. --- Another variant on this approach would be to part type switch the 5' homology arm (5HA) and 5' linker (5L) parts to remove the TSel slot. 5. Keoni says Poly can do methylation-aware Golden Gate assembly simulation, using some regular expression tricks. He's willing to do a wetware design session/tutorial to show how this works. 6. The part type switch (PTS?) linkers should have a pair of commonly used primer binding sites on them. -- This is so that users could have the option to PCR amplify out the PTSed genetic part, instead of transforming into an HpaII methyltransferase-negative strain, colony picking and cloning. -- The effect is basically the same as for transformation and cloning: the methylation is removed, and the mBsaI sites are made available for cleavage. -- The advantage of PCR amplification is that it is much faster than transformation and cloning: PTS could be completed in a couple of hours, rather than a couple of days. -- The disadvantage of PCR amplification is that it has a higher mutation rate than Golden Gate and cloning, especially when using Taq polymerase (vs Pfu-Sso7D). So sequence verification of the PTSed part might be required. -- Can't use M13F and M13R primers, because those are already present on the pOpen backbone. T7F and T7R primers are an option, though the T7 promoter is a pretty commonly used part in E. coli work. Should think more about this. 7. It would be nice to build ccdB dropout cassettes for the various slots in the AllClo assembly standard. -- Would also be good to enable PTS on these dropout cassettes. -- PTS can be achieved with one ccdB part, and all the various PTS linkers. -- However, it will also be useful to have linkers with mBsaI sites that generate the exact same overhangs as the part type to be inserted after the dropout, with no additional scarring. --- So for instance, if you wanted to test a bunch of different promoters in a new plasmid context, it would be very useful to be able to assemble the plasmid using the following part in the promoter slot: -->BsaI>GGAG<mBsaI<---ccdB dropout cassette--->mBsaI>AATG<BsaI<-- So that after BsaI digestion/Golden Gate assembly, you get the following: --rest of the vector---GGAG<mBsaI<---ccdB dropout cassette--->mBsaI>AATG---rest of the vector-- And after transformation in to an HpaII methyltransferase-negative strain, BsaI digestion and Golden Gate gives you this: --rest of the vector---GGAG --any promoter part you want-- AATG---rest of the vector-- 8. The BacClo and ProClo 1.0 assembly standards have incompatible overhangs and don't use the optimal overhang sets calculated by Keoni. There are three potential ways to address this: -- One, we could re-synthesize these collections with the new overhangs. --- It's not clear that FreeGenes would go for this, but it might be feasible with Keoni's/Trilobio's polymerase-cycling-assembly-free, multibarcoded oligo pool assembly method. -- Two, we could just design PTS linker sets to switch all the existing parts over to the AllClo assembly standard. --- The advantage of this approach is that it requires much less synthesis. --- The disadvantage of this approach is that it will leave scar sequences: at least 4 bp for non-protein-coding parts, and at least 6 bp for protein coding parts (to maintain the proper reading frame). -- Three, we could design primer pairs to amplify and perform the PTS on these parts collections. --- The advantage of this approach is that we can switch the overhangs of the existing parts without leaving any scar sequences. --- The disadvantages of this approach are the potential to introduce mutations during PCR (necessitating sequence-verification of several colonies of all PTSed parts); and the need for a unique primer pair for each part, which will make PTSing all of the parts in these collections expensive (though likely less expensive than re-synthesis). 9. In order to maintain the options of more complex, composite assembly alongside the option for a more simple/traditional MoClo assembly, PTS linkers are required to switch parts from MoClo to FiveClo, ThreeClo, and updated ProClo. In particular: -- PTS linkers are required for converting any ProClo part type to any other ProClo part type, while maintaining the same reading frame and introducing only small, hydrophilic amino acids into the linkers. -- PTS linkers are required for converting promoters from traditional MoClo to FiveClo. --- Converting to FiveClo should introduce overhangs that enable (at minimum) insertion of a Rec site 5' to the promoter, and a ribozyme insulator 3' to the promoter. -- PTS linkers are required for converting terminators from traditional MoClo to ProClo, and to ThreeClo. --- Converting to ThreeClo should introduce overhangs that enable the insertion of a Rec site 3' to the terminator. --- Converting to ProClo ideally should entail a change to the terminator's 5' overhang, but no change to the core CDS's overhangs. This will require changing how CDSs are defined in ProClo 1.0. --- For updated ProClo, it might be nice to place the stop codon(s) in the terminator part, rather than on any of the protein coding parts. The extra couple of amino acids that get added due to the last CDS-terminator overhang should be fine, because (1) if you're using ProClo, you're already adding a bunch of tags with short flexible peptide linkers between them; and (2) if you don't want any additional C-terminal residues because they might impact your protein's function, you'll already have added a stop codon(s) into your CDS and more stop codons in the terminators won't matter. --- For updated ProClo, it might be nice to have an explicitly noncoding linker, for proteins that don't use any C-terminal tags or modifications. 10. I believe we will need a holding vector with a different selection marker for PTSed parts. -- Since the standard parts and all the linkers will be held in Ampicillin-resistant pOpen_v3, we need a different selection marker for the vectors that will hold the parts after PTS reactions. -- Kanamycin? Chloramphenicol? Open to suggestions here. 11. The linker set Keoni has already designed work well as AllClo-compatible replacements for the multi-transcription-unit assembly linkers Scott designed for the Open Yeast Collection (the parts named numbered variations on AConL and AConR). Keoni's linkers use mBsaI to generate the 2nd-level overhangs, while the OYC linkers use BbsI sites. -- Keoni will clone and send us these linkers for free if we genomically integrate constitutively expressed HpaII methyltransferase into an unambiguously public domain cloning strain of E. coli. 12. Here's a question: are there academic labs we should reach out to for collaboration in building/validating the parts for AllClo? Three ideas here: -- Fernan Federici, who has already helped us and whose lab designed uLoop assembly (already reached out, and Isaac Núñez is now on the AllClo design team!). -- The original designers of MetClo, upon which AllClo's PTS system is based: Da Lin, and Professor Christopher O'Callaghan at Oxford University. -- Drew Endy, since he's still leading FreeGenes and since he may have good ideas/connections for getting some funding and researcher-hours behind building and validating AllClo.

eyesmo commented 2 years ago

A summary of the action steps from the notes above, in no particular order:

  1. Identify and obtain unambiguously public domain cloning strains of E. coli.
  2. Decide which overhangs will go where for the AllClo versions of: -- VecClo (8 overhangs) -- FiveClo (4 overhangs) -- ProClo (7 overhangs) -- and ThreeClo (2 overhangs). -- 21 overhangs used, 3 overhangs to spare for extra part types to be added later (maybe to increase construction options for FiveClo and ThreeClo?). -- You can actually add arbitrarily more overhangs, but for the moment we're just considering the optimized 24-overhang set calculated by Keoni.
  3. Decide which well-validated primer pair should go onto the linkers. T7F and T7R, or something else?
  4. Design a modified pOpen cloning vector for holding PTSed parts. Will likely need a different selection marker. Decide which selection marker. Kanamycin? Chloramphenicol?
  5. Re-design the AllClo-compatible linker set Keoni designed for construction of multigene cassettes, with the following modifications: -- Add the well-validated primer pair to the linkers, outside the mBsaI sites (so the primer sequences get cut off by mBsaI cleavage). -- Add commonly used recombination sites to the linkers, inside the mBsaI sites (so the Rec sites get left in the assembled construct after mBsaI cleavage).
  6. Design a PTS linker set for converting promoters from MoClo to FiveClo.
  7. Design a PTS linker set for converting terminators from MoClo to ThreeClo and/or to updated ProClo (including adding stop codons before the terminator).
  8. Design a PTS linker set for converting any updated ProClo part to any other updated ProClo part.
  9. Design a PTS linker set for converting all the Open Yeast Collection (and Open Bacillus Collection) parts from BacClo 1.0 assembly standard to the VecClo subcomponent of the AllClo assembly standard.
  10. Design a PTS linker set for converting all the ProClo v1.0 parts (and probably also the tag/CDS parts from the Open Bacillus Collection) to the updated ProClo subcomponent of the AllClo assembly standard. -- Maybe also design primer pairs to enable conversion of these parts without introducing additional scar sequences. --- Perhaps only do this for a subset of parts (e.g. the cleavage tags) for which adding scar sequences might significantly decrease the usefulness of the part.
  11. Design a ccdB dropout cassette part, and a linker set for placing that dropout cassette in any slot on the AllClo assembly standard. -- Maybe also do this with a couple of other E. coli counterselection markers (e.g. sacB), just to add some flexibility to the system. -- Design a two-layer linker set that enables the insertion of dropout cassettes/spacer parts into any location in AllClo, and then the conversion of that dropout cassette/spacer to generate any pair of AllClo overhangs, like so: -->BsaI>OH1--OH2<mBsaI<---linker1---OH3<BsaI< + >BsaI>OH3---linker2---OH4<BsaI< + >BsaI>OH4---dropout cassette---OH5<BsaI< + >BsaI>OH5---linker3---OH6<BsaI< + >BsaI>OH6---linker4--->mBsaI>OH7--OH8<BsaI< Which, after BsaI Golden Gate, becomes: ---rest of vector---OH1--OH2<mBsaI<---linker1---OH3---linker2---OH4---dropout cassette---OH5---linker3---OH6---linker4--->mBsaI>OH7--OH8---rest of vector--- Which, after PCR or cloning into HpaII-null strain and another Golden Gate, becomes: ---rest of vector---OH1--OH2 --whatever parts you want here, as long as the first part starts with OH2 and the last part ends with OH7-- OH7--OH8---rest of vector---
  12. Reach out to Drew Endy and Chris O'Callaghan/Da Lin about collaborating on building/validating AllClo.
Koeng101 commented 2 years ago

For posterity, here is an email exchange with Isaac and I discussing ProClo:

Isaac -> Scott,Isaac,Keoni

Hi Scott, Isaac and Keoni,

I’ve worked up a draft of a ProClo 2.0 assembly standard, that uses the optimized MoClo-compatible 24-overhang set Keoni calculated, that has exclusively small/hydrophilic amino acid dyads at the in-frame overhangs, and that preserves the MoClo part definition of CDSs (AATG-GCTT). Would love to get your thoughts and feedback on it.

All the best, Isaac

Keoni -> Isaac^2,Scott

I'm mainly thinking that recursive BsaI + SapI is a better solution. ProClo looks like it requires C-terminal modification, which can affect the usability of proteins (I think T7 RNAp had that problem, if I remember correct). If we imagine that we want to eventually do genome-scale engineering, that requirement could be quite detrimental, or at least cause a shift in the usage. SapI should solve that with seamless fusions.

Recursive BsaI could also help solve the issue of N terminal tagging. If there is a need for single-reaction, investing in the goldengate->pcr->goldengate technology could be a good idea, since that will generally be applicable (anyone can use). Also doesn't require changing the RBS definition of TACT-AATG.

I am, however, considering changing a few definitions and rebuilding my set. For more on that, check out - https://github.com/trilobio/recursive_bsai . Basically, you should be able to overlap TCTC,TGAG to get an methylated BsaI site that can then cut internally defined second site. I plan on using it for seamless cloning of 500mer+500mer(s), but it might be interesting to append TCTC,TGAG to the base set for seamless redefinition. Haven't thought it fully through, but would require a change of GGAG.

In general, I think that the issues of protein tagging and such have technical solutions that just need to be tested. I'm starting all those (that are relevant to our cloning system) in earnest now. To me honestly, I think that the sets are less interesting than the overall system. DNA synthesis is cheap itself so long as cloning is implemented right, so I'm focusing a bit on that (https://keonigandall.com/posts/affordable_dna_2.html under "Innovation in synthesis is not far away")

Cheers,

Keoni

Isaac -> Keoni,Isaac,Scott

I'm mainly thinking that recursive BsaI + SapI is a better solution.

A couple of things: (1) Is SapI off-patent, or does it have an off-patent isoschizomer? If not, it’s sub-optimal for designing a high quality, unambiguously public domain assembly standard. (2) Even if SapI is now or will soon be in the public domain, adding additional enzymes to the assembly standard kind of impairs one of the things that makes recursive BsaI/AllClo so nice—the fact that you can do basically everything with one restriction enzyme. If we add more, the question arises of why we don’t just use an existing multi-enzyme assembly standard, like GoldenBraid or uLoop.

ProClo looks like it requires C-terminal modification, which can affect the usability of proteins (I think T7 RNAp had that problem, if I remember correct).

A couple of things: (1) the primary use case for ProClo is specifically in contexts where you want to tag proteins or build fusions; if you want to do genome-scale engineering or do anything working just with native, un-modified protein sequences, you can just add a stop codon before the GCTT overhang (as is already standard practice for MoClo) and define your terminators with the standard MoClo 5’ overhang. (2) T7 RNA polymerase notwithstanding, the set of proteins that don’t tolerate any C-terminal tags is tiny compared to the set that do tolerate them, so it seems sub-optimal to focus design of the assembly standard on the intolerant subset. If you don’t want C-terminal mods for a particular CDS, you can just add a stop codon before the GCTT overhang.

Recursive BsaI could also help solve the issue of N terminal tagging. If there is a need for single-reaction, investing in the goldengate->pcr->goldengate technology could be a good idea, since that will generally be applicable (anyone can use). Also doesn't require changing the RBS definition of TACT-AATG.

I agree that allowing the RBS to keep its MoClo part type definition would be nice. I can see the argument for making ProClo 2.0 a recursive ‘level 0’ assembly standard that’s fully MoClo-compatible, even if that breaks the flatness/one-pot-ness of AllClo. Interested to hear others’ thoughts on this.

I am, however, considering changing a few definitions and rebuilding my set. For more on that, check out - https://github.com/trilobio/recursive_bsai .…Haven't thought it fully through, but would require a change of GGAG.

Interested to see how this experiment goes. Four questions about it: (1) if this strategy requires you to change/remove GGAG, doesn’t that mean it’s not backwards-compatible with MoClo? (Unless you’re planning to use these overhangs only in level 0 or level 2+ recursive assemblies, with no MoClo promoter parts). (2) how well do the TCTC,TGAG overhangs play with the 24 overhangs in your original set? (3) It looks from your description in that repo like you’re testing how methylation at different locations impacts T4 ligase activity/fidelity. Are there papers that suggest this might be an issue? If so, that could be a pretty big problem for the recursive BsaI strategy. And (4) the repo doc mentioned you’re still getting some cutting on a methylated restriction site. What exactly is the sequence that was being methylated and cut in that experiment? Was it an HpaII-BsaI site?

DNA synthesis is cheap itself so long as cloning is implemented right, so I'm focusing a bit on that (https://keonigandall.com/posts/affordable_dna_2.html under "Innovation in synthesis is not far away")

Making synthesis cheaper is super important and I’m rooting for you/TriloBio to achieve this; but in the meantime, high-quality libraries of free, public domain DNA parts running on a powerful assembly standard will still be very useful to people all around the world!

Finally, do we want to share this discussion in this issue thread? Could be useful to others to make in publicly viewable. https://github.com/friendzymes/community/issues/15

All the best, Isaac

Koeng101 commented 2 years ago

Continuing conversation here:

Is SapI off-patent?

https://patents.google.com/patent/EP0818537A2/en Yes

If we add more, the question arises of why we don’t just use an existing multi-enzyme assembly standard, like GoldenBraid or uLoop.

Because those standards don't allow for recursion in complex assembly. For all intents except C-terminal fusions, SapI does not add any complexity to the current method, only 3 base pairs.

or do anything working just with native, un-modified protein sequences, you can just add a stop codon

Or you could use SapI, which has the stop codon built in. Then you don't need divergent protein coding sequences for both assembly types.

the primary use case for ProClo is specifically in contexts where you want to tag proteins or build fusions; if you want to do genome-scale engineering or do anything working just with native, un-modified protein sequences, you can just add a stop codon before the GCTT overhang

As an alternative, you could simply use SapI and get the best of both worlds. Complete compatibility with un-modified protein sequences and compatibility with larger tagged constructs.

The resistance seems to stem from the fact that SapI adds a restriction enzyme to the mix in the particular case of C-terminal fusions. But there are clear advantages as well - no set linker sequences and complete compatibility with all other types of cloning. What if you wish to build a genome with an entire pathway being his-tagged? I can agree that it may be superior for single protein fusions, but I think SapI is superior when you want to do many fusions or mix between different kinds of fusions and non-fusions.

For the questions on the new sets:

(1) does it break compatibility with MoClo?

Yes, it does. I don't really care though since we're resynthesizing everything from scratch anyway.

(2) how well do the TCTC,TGAG overhangs play with the 24 overhangs in your original set?

Will probably generate a new set.

It looks from your description in that repo like you’re testing how methylation at different locations impacts T4 ligase activity/fidelity. Are there papers that suggest this might be an issue?

Not testing different spots, I linked a paper that already did that. We're using objectively the worst spot available because it is less risky and has methylation on the right strand for doing complete re-shuffle (If only BtgZI was more available...)

(4) the repo doc mentioned you’re still getting some cutting on a methylated restriction site. What exactly is the sequence that was being methylated and cut in that experiment? Was it an HpaII-BsaI site?

HpaII uses B2, which is also still kind of crap at blocking BsaI activity. We're using B1 so that the sequence doesn't interact with ligase. T1 and T2 are much better, but are on the wrong strand. Much sad.

Making synthesis cheaper is super important and I’m rooting for you/TriloBio to achieve this; but in the meantime, high-quality libraries of free, public domain DNA parts running on a powerful assembly standard will still be very useful to people all around the world!

I think you may have missed the point of that essay :) DNA synthesis is cheap and none of our tech is aimed at making it cheaper. Cloning is not cheap, so all of our tech focus is on making cloning cheap, which is relevant to a powerful assembly standard (and our parts will be free/public domain anyway). Essentially what we're doing is focusing on process optimizations that make the assembly system easier, more powerful, and cheaper.

For this reason, I'm not concerned about parts themselves, but I am concerned about the simplicity of the system because that directly translates to easier process optimizations. ProClo makes things more complex by adding in a whole new standard to keep track of. SapI keeps things simple, because although you can use it, it for the most part stays completely out of the user and executor's way. Recursive BsaI is also similar - it flattens the vector and insert landscape, simplifying overall use. Both can be augmented with complex parts, but the underlying system is relatively simple, allowing for those additions on top.

Prosimio commented 2 years ago

Isaac N -> Keoni, Isaac, Scott

Hi! I totally agree with Isaac about "ProClo looks like it requires C-terminal modification" topic. You just have to add a stop codon in the cases it is not possible to include tags or any extra codons.

About SapI, as long as I know, its patent expired. Overall, although is a bit slower than BsaI, it works really nice (it's stored at -20°C, is stable, cheap, etc): For me, the main advantage of using one enzyme is the reduction effort at domesticating the components (i.e. eliminate the SapI recognition sites).

Another thing: I don´t recommend using the "MoClo part definition of CDSs (AATG-GCTT)" instead I think is more valuable to use the CIDAR/uLoop/ProClo CDS definition (AATG-AGGT). It allows to include tags, fusions, etc, or just use a triple stops codon element to finish the protein (remember that CDSs don´t include the codon stop). --> this component is between (AGGT-GCTT) which makes it also backward compatible with the rest of the MoClo components.

Best!

Scott -> Keoni, Isaac**2

Hi all,

Yes, SapI and it's methylases are IP free and indeed their CDSs are included in the Open Enzyme Collection! You can make your own.

I agree with Isaac Núñez in that we should conform to existing standards as best as possible. I noticed in the ProClo draft that quite a few overhangs differ from the “Freegenes” standard used in the E. coli Protein Expression Toolkit that we all contributed to creating. Any changes should be justified with a strong rational.

I made a 2nd tab in the g-sheet that compares the two.

Cheers, Scott

eyesmo commented 2 years ago

Questions and responses for @Prosimio:

  1. The uLoop paper says the odd and even vectors use chloramphenicol and spectinomycin as selection markers, but the protocols.io protocol for uLoop says they use kanamycin and spectinomycin. Is this because uLoop and Loop assembly use different antibiotic pairs? And what's the design rationale for using chloramphenicol and spectinomycin as opposed to other antibiotics?
  2. Do you know if iGEM is planning to use the uLoop definition (AATG-AGGT) for CDS parts in their re-synthesized Distribution? If so, I'm more than happy to use that definition for CDS parts in ProClo 2.0.

Responses for Scott:

  1. The overhang changes in the ProClo 2.0 v1 draft use the super high fidelity overhang set @Koeng101 calculated here, which is predicted to have 97% fidelity in a 24-part assembly with BsaIHFv2 at 37ºC for 16 hours/overnight. My thinking here has been that we're going to need to re-formulate at least some of the BacClo (Open Yeast/Bacillus backbone) and ProClo 1.0 overhangs, since the two sets are mutually incompatible with high fidelity assembly (as explained here). And since we're going to be redefining these assembly standards anyway, it's better to use the highest fidelity calculated overhang set. For the parts we've already built with the old BacClo and ProClo 1.0 overhangs, we can design and synthesize part type switching linkers to change their part definitions to the improved and unified AllClo standard, which includes ProClo 2.0, VecClo (the successor to BacClo), FiveClo and ThreeClo, all defined with different subsets of the same super high fidelity 24-overhang set.
  2. All of that being said, I do recognize the advantages of trying to keep to the existing BacClo/ProClo 1.0 overhangs where possible, in terms of minimizing the effort required to convert parts from the older standard to the new standard. My thinking here has been that there would be an opportunity to re-synthesize the OYC and the ProClo 1.0 tags from the Protein Expression Toolkit, possibly through the iGEM Registry 2.0 or through @Koeng101's/TriloBio's high-throughput cloning infrastructure; and that we should take advantage of that re-synthesis opportunity to transition to a more optimal overhang set. It's also been part of my plan to design a part-switching linker set that converts OYC/BacClo and ProClo 1.0 parts to the new, unified assembly standard. However, since linker parts are relatively small and simple to synthesize, we could also design a high fidelity overhang set that uses as many of the BacClo/ProClo 1.0 overhangs as possible (I already have replacement overhangs for the incompatible ProClo overhangs, shown in cells Y10-AC22 on the 7th tab of this google sheet: CCAT-->GACC, GTCA-->CACG, TTCG-->CCAG); and we could design a linker set for AllClo-style part type switching with that overhang set as well. Comparing the performance of the two overhang sets (and more generally, comparing the performance of different MoClo expansion assembly standards for different numbers and lengths of parts) is the type of experiment I think we could propose as an iGEM interlab study, now that we have multiple Friendzymes members on the iGEM Engineering Committee.

Responses/questions for @Koeng101:

  1. I don't think we're going to reach a consensus on C-terminal tagging with SapI vs BsaI in this thread. It seems to come down to which advantages, disadvantages, and use cases each of us emphasizes. The two approaches are, as far as I can tell, incompatible: there's no way to interconvert parts from one approach to the other without PCR mutagenesis involving at least one part-sequence-specific primer (correct?). And my understanding is there's a pre-existing FreeGenes wetware base that already uses each approach--the Protein Expression Toolkit for BsaI, and the organismal gene sets for SapI (correct?). I know you/TriloBio have your own development plans, and no need to stick to our standard if we decide to go a different direction; but I think it might be useful to raise this question with the iGEM Engineering Committee. People there might have a perspective (or be able to reach a consensus) that persuades us to all stick with one approach or the other. For myself, if iGEM decides to adopt either of these strategies for design of their re-synthesized Registry, I'll be happy to follow that convention as well.
  2. On being fine with breaking MoClo compatibility because all the parts are being re-synthesized and won't require the type of domestication @Prosimio discussed: this might be reasonable if you were willing to re-synthesize every GCTT-defined MoClo toolkit part ever developed, thereby minimizing the time and cost of conversion for pre-existing toolsets. Which, given your planned progress on cheap/easy cloning and iGEM's plans for Registry re-synthesis over the next couple of years, might not be unreasonable! But I do think that given the scale of genetic toolkits you're planning to make, this type of change might benefit from broader community discussion. I know I keep bringing up the iGEM Engineering Committee, but it's only because we now have the ability to share our ideas with a decent-sized cross-section of the synbio/bioengineering community, which is also planning on large-scale free/OpenMTA-distributed genetic toolkit synthesis over the next year or two. I feel like we should compare notes with them and see if we can all get on the sam page for these types of decisions, as long as doing so doesn't negatively impact your development timeline.
  3. Also, I don't yet grasp the design advantages of the (TCTC, TGAG) overhang pair for recursive assembly; could you elaborate on that?
  4. From the paper you referenced, which performed the experiment to figure out which cytosines within the BsaI recognition site (top strand: GGT C T C; bottom strand: GAGA C C) most efficiently block digestion:

    The restriction digests clearly reveal that methylation of the bottom strand only partially protects the DNA from digestion, while methylation of either cytosine in the top strand effectively protects the DNA from digestion by BsaI (Figure S2).

    Screen Shot 2022-01-08 at 7 32 06 PM

This suggests that recursive mBsaI linkers that use enzymatic methylation with HpaII or MspI, both of which methylate the bottom strand, are probably going to suffer from relatively high rates of undesired digestion of methylated mBsaI sites during assembly, correct? Though this data seems to contradict the results from the Great Lakes Biotech researcher you reference, who appears to have achieved very low rates of cleavage of bottom-strand-methylated BsaI sites. I'm guessing there's no way to enzymatically methylate the top strand cysotines (GGT C T C, T1 and T2 in this paper's nomenclature) only in mBsaI sites, but not in regular BsaI sites; and this is why you're looking into chemically synthesized oligos (correct?).

  1. From a Friendzymes perspective (trying to make the means of genetic assembly as cheaply replicable/distributable and globally accessible as possible), a recursive assembly method that relies on relatively expensive chemically modified, phosphoramidite-synthesized oligonucleotides to work is less than ideal; even if a single tube of oligos is enough for many reactions, it still creates a dependence on expensive oligo-synthesizing machines, reagents and productive capacity that's concentrated in countries that already have well-developed biotech industries.

  2. Continuing on the point about inefficient digestion protection from methylating the bottom strand cytosines (GG C C AGAG in a rightward-facing >mBsaI> site, B1 and B2 in the paper's nomenclature): have you considered ordering and testing an oligo with both B1 and B2 methylated? This would recapitulate the potential effect of co-expressing HpaII and MspI (assuming methylation by one enzyme doesn't inhibit methylation by the other, which may or may not be true); and might yield greater inhibition of digestion than methylating either B1 or B2 alone. And if double-methylating the bottom strand works well, that could be an approach that is still compatible with Friendzymes' objectives.

  3. You mentioned that B1 (GG C CAGAGNNNNN) is the only methylation site that wouldn't interact with T4 ligase's binding site. Do you have any references on the size of T4 ligase's binding 'footprint,' and any references that suggest cytosine methylation interferes with T4 ligase binding and fidelity?


One takeaway I'm getting from this conversation is that there's a bit more technological uncertainty/risk around at least the enzymatic/oligo-free version of recursive mBsaI assembly than I'd been aware of. Given that, I think it might be worthwhile for Friendzymes to also design linker sets that perform part type switching with a second, orthogonal Type IIS restriction enzyme, as the assembly linkers/connectors Scott designed for the Open Yeast Collection do with BbsI. Do you all agree?

Koeng101 commented 2 years ago

And my understanding is there's a pre-existing FreeGenes wetware base that already uses each approach--the Protein Expression Toolkit for BsaI, and the organismal gene sets for SapI (correct?)

Yep. It'll also depend on the results of an experiment I'll be running with SapI. If it turns out that it is impractical in reality, I'd be happy to switch over. But I think it is something that is hard to get consensus on until there are real experiments run.

as long as doing so doesn't negatively impact your development timeline

I agree, we should talk it over.

Also, I don't yet grasp the design advantages of the (TCTC, TGAG) overhang pair for recursive assembly; could you elaborate on that?

A little diagram from the recursive bsai repo might help:

# Forward (B1)
...G GTCTC
...CmC
# Reverse (B1)
TGAGACCm...
    TGG ...

Basically, that particular set (and essentially only that set) allows for efficient part switching while keeping the methylation site away from the ligation site.

I'm guessing there's no way to enzymatically methylate the top strand cysotines (GGT C T C, T1 and T2 in this paper's nomenclature) only in mBsaI sites, but not in regular BsaI sites; and this is why you're looking into chemically synthesized oligos (correct?).

It is only able to do those sequences, but the real reason I'm looking into chemical synthesis is purely expediency. Gotta get it working ASAP. B1 is conservative for when you need seamless overhangs (like doing DNA assembly), while T1 or T2 is what we'd probably use in linkers. Really not a thing about efficiency though, it's pretty much all expediency.

creates a dependence on expensive oligo-synthesizing machines, reagents and productive capacity that's concentrated in countries that already have well-developed biotech industries

You just become dependent on other things, like the ability to manufacture chemicals for minipreps and such. Still, I agree with you, because I think that supply chain independence is fun!

have you considered ordering and testing an oligo with both B1 and B2 methylated?

I have considered, but these experiments are rather expensive (would be about $500 to test that). Overall, you're making a basic assumption that might not be true - that the inefficiency practically matters. Our evidence right now from our friends at Great Lakes would point towards it not mattering.

Do you have any references on the size of T4 ligase's binding 'footprint,'

To be honest, I didn't have the right keywords (ligase binding footprint) to find that info beforehand. Looks like it is 11 base pairs roughly. So if methylation breaks things, it breaks things. I suspect it'll be fine cause T4 is known to methylate itself, but still might break things.

jcahill commented 2 years ago

A work session seems like it would help here.

eyesmo commented 2 years ago

Wrote this up above in an edit to my AllClo notes, reposting here for thoughts/feedback. Here's a proposed architecture/library of part types for AllClo:

AllClo should combine the following into a single unified assembly standard, using a single high fidelity 4bp BsaI overhang set:

Important note: It will be rare for a bioengineer to use all these part types at once to construct a single genetic device in a 1-pot reaction. In fact, many of the part types within AllClo are specific to particular genetic contexts/use cases, and would therefore not be needed in many other genetic contexts/use cases. However, does that mean we should restrict the use of these parts only to bioengineers who can afford and have easy access to custom gene and oligo synthesis? I say no: people should have the option of using these more exotic/context-specific parts, and besides receiving standard FreeGenes libraries (and maybe a small number of standard primers/DNA oligonucleotides), they shouldn't require any DNA synthesis or many-step hierarchical assembly reactions to do so. However, we don't want the addition of these optional/context-specific parts into the assembly standard to increase the complexity of the assembly reactions users must run for building more run-of-the-mill devices like standard transcription units. For example, we probably want to avoid requiring every transcription unit assembly reaction to include a bunch of neutral 'spanner' parts as placeholders for infrequently used part types.

I propose that we handle this balance between optional complexity and default simplicity with Part Type Switching (PTS) linkers. For example, by default the core promoter part will have the same overhangs as in uLoop. To build a more complex 5' UTR, an assembly reaction will be performed between the promoter and two PTS linkers, that can expose a new pair of 4 bp overhangs, changing the promoter's overhang definition to a pair of FiveClo overhangs (BsaI/BbsI uLoop promoter-->FiveClo promoter PTS linker example here; BsaI/mBsaI uLoop promoter-->FiveClo promoter PTS linker example here). In parallel, the set of FiveClo parts that one wishes to assemble and use will also undergo PTS reactions with linkers to convert their overhangs to close any gaps in the assembly caused by unused part slots. This way, while each promoter starts out with FiveClo-incompatible overhangs and each FiveClo part listed below starts with its own unique combination of two overhangs, a single round of PTS reactions later (followed by either transformation/cloning or PCR amplification of PTSed parts), all parts have the overhangs required for the desired assembly; and moreover these overhangs are compatible with any combination of VecClo, ProClo, and ThreeClo parts as well, up to and including one-pot full vector assembly reactions. This PTS design strategy can be applied to VecClo, FiveClo, ProClo, and ThreeClo, ensuring simple and backwards-compatible uLoop assembly by default, while enabling access to the full array of complex/composite AllClo genetic design and high fidelity/one-pot assembly after a single round of PTS reactions. End of important note

Proposed AllClo part types (overhang sequences TBD)

  1. FiveClo (8 part types, plus 3 extra 'free' overhangs): -- 5' linker for multi-TU assembly (5L). Used along with the 3' linker to do secondary assembly of multiple transcription units. Variants of this part type could also be combined with recombinase recognition sites for deleting transcription unit(s). -- 5' recombinase binding site (5rec). Combine with 3' recombinase recognition (3rec) parts to delete one or more transcription units, or to integrate them into a pre-existing recombinase landing pad in a chromosome or other genetic device. -- Distal promoter element (dPromE). Important in some eukaryotic genetic devices for controlling and/or increasing gene expression. -- Core promoter (Prom). The core promoter element, controlling transcription. Different overhangs than in MoClo/uLoop (requires part type switching). -- Operator (Op). Binding site for a conditional/inducible transcription factor. One of the main mechanisms of transcriptional control in genetic circuits. -- 5' recombinase recognition site, flipping variant (5recF). Combines with 3' recombinase recognition (3rec) parts to flip RBS+CDS pairs and turn expression on or off digitally. -- Ribozyme (Rbz). RNA element for controlling or regulating gene expression. Examples include ribozyme insulators that cleave off the the untranslated region of the mRNA 5' to the insulator, increasing the reliability/modularity of promoter and CDS parts by reducing the risk of changes to device behavior due to changes in 5'UTR folding; toehold switches that bind to and block the RBS until displaced by binding another complementary nucleic acid molecule, regulating expression at the level of translation; STARs that terminate transcription prior to gene expression, unless bound and blocked by a small RNA in trans, regulating expression at the level of transcription; and riboswitches that change conformation/cleavage activity (and therefore gene expression) in response to a small molecule, nucleic acid or protein binding event. -- Ribosome binding site (RBS). Required for prokaryotic translation initiation. The RBS (along with mRNA folding structure around the start codon) determines the rate of translation initiation. Given the sensitivity of this part to sequence modifications, it should ideally maintain the same overhangs as in uLoop, and not require any part type switching. -- Designate three additional 'free' FiveClo overhangs (along with the associated part type switching linkers), to enable people to add more part types to the 5' UTR (e.g. additional distal promoter elements, ribozyme elements, or operator elements).

  2. ProClo2.0 (7-8 part types, depending on if ProClo RBSs are treated separately from localization tags and FiveClo RBSs): -- Ribosome binding site (RBS). Required for prokaryotic translation initiation. For use in ProClo (but not FiveClo), the RBS probably requires a different 3' overhang than is used for uLoop (AATG), since the core CDS (5' overhang AATG) is now located 3-4 parts away. Basically, special sets of RBSs should be used to pair with localization tags. Note: ProClo1.0 solved this problem by combining the RBS and Loc part types into a single RBS/Loc part (e.g. for the B. subtilis secretion tag library plasmids). I think they should be separated in ProClo2.0, but I'm open to being persuaded to keep this arrangement. -- RBS/Localization tag (Loc). An example use case is a secretion tag (or, as in ProClo 1.0, an RBS/secTag pair). -- N1 tag (N1). The first of 3 N-terminal CDS tags/fusions (not including the Loc tag). Example use cases include purification tags, and trans-splicing inteins for building multi-domain protein fusions post-translationally. -- N2 tag (N2). The second of 3 N-terminal CDS tags/fusions (not including the Loc tag). Example use cases include reporter tags (e.g. fuGFP or mCherry), or cleavage tags (e.g. inteins or protease recognition sites). -- N3 tags (N3). The third of 3 N-terminal CDS tags/fusions (not including the Loc tag). Example use cases include cleavage tags (e.g. inteins or protease recognition sites) and reporters (e.g. fuGFP or mCherry). -- Protein Coding Sequence (CDS). The core CDS, with the same overhangs as a CDS in a regular uLoop transcription unit (5'-AATG--AGGT-3'). -- C1 tag (C1). The first of 3 C-terminal CDS tags/fusions. Example use cases include cleavage tags (e.g. inteins or protease recognition sites) and reporters (e.g. fuGFP or mCherry). -- C2 tag (C2). The second of 3 C-terminal CDS tags/fusions. Example use cases include cleavage tags (e.g. inteins or protease recognition sites) and reporters (e.g. fuGFP or mCherry). -- C3 tag (C3). The third of 3 C-terminal CDS tags/fusions. Example use cases include purification tags, degradation tags, and trans-splicing inteins for building multi-domain protein fusions post-translationally. -- It might be useful later on to develop further expanded AllClo-compatible ProClo 2.0 overhang sets, that generate application-specific amino acid dyads in their 4-bp scar sequences. Example use cases include the construction of novel TALE proteins, and the construction of novel fibrous, multidomain proteins (e.g. silk fibroins).

  3. ThreeClo parts (3 part types, plus 3 extra 'free' overhangs): -- Terminator (Term). Terminates transcription of mRNA, by causing RNA polymerase to dissociate the DNA and the growing mRNA. -- 3' recombinase recognition site (3rec). Used in combination with 5rec or 5recF parts, can delete transcription unit(s), integrate transcription unit(s) at recombinase landing pads, or flip coding sequences on and off. -- 3' linker for multi-TU assembly (3L). Used along with the 5' linker to do secondary assembly of multiple transcription units. Variants of this part type could also be combined with recombinase recognition sites for deleting transcription unit(s). -- Designate three additional 'free' ThreeClo overhangs (along with the associated part type switching linkers), to enable people to add more part types to the 3' UTR (e.g. poly-A tails, microRNA binding/cleavage sites, and additional terminators on the top and bottom strand to insulate the device even when the RBS+CDS are flipped and the terminators are next to the promoter and 5' recombination sites).

  4. DOclo parts (1 part type with DropOut/Part Type Switch linkers, plus two extra 'free' overhangs): -- Dropout part (DO). Often when constructing a genetic device, it is necessary to test the function of a library of a specific part type or combination of part types. In these cases, it is useful to replace these specific parts in the initial assembly with a dropout cassette, which contains a counter-selection marker and 'outward' facing IIS restriction sites that will cut out the counterselection marker and expose the required overhangs to insert specific part type/set of part types. Rather than build variants of dropout (DO) cassette parts with every possible combination of AllClo overhangs (which would require impractical numbers of copies dropout cassette parts, differing only in the overhangs they bind to and expose upon excision), it is simpler to define a specific DO part type with a unique, AllClo-compatible pair of overhangs, and then design Dropout linker (DO linker) parts that link between those overhangs and any other AllClo overhang, and that contain the outward IIS restriction sites required to cut out the dropout cassette (example here). Then, inserting the DO cassette into any location within an AllClo assembly simply requires adding the DO part, and the relevant pair of DO linkers, to the assembly reaction. Example design of -- Designate two additional 'free' DOclo overhangs, in case people want to insert two separate dropout cassettes into the same assembled vector. Example use cases include using dropout cassettes in place of the 5' and 3' homology arms on either side of a transcription unit, to enable rapid testing of device performance upon insertion into different genomic loci; and using dropout cassettes in place of 5recF and 3rec parts around a transcription unit, to enable rapid testing of how different recombinase recognition sequences impact expression of the flippable genetic device.

eyesmo commented 2 years ago

^In total, the above defines 26-27 distinct part types, requiring a high fidelity, uLoop-compatible set of ~36 4bp overhangs.

Koeng101 commented 2 years ago

Generally my thoughts are:

  1. Too much at once
  2. Not enough doing (too much thinking)

But:

Standards are... cool?

From my first look over, I think AllClo may be trying to do too many things at once. Basically the classic XKCD -

standards

One thing I have been considering lately is the impact of automation. In fact, I think the field would be far further along if there was a switch to TK's BB-2 (oww has good diagram, basically allows fusions with biobricks type cloning) and the entire focus was instead on making it super easy to actually use those formats.

Ie, say JCVI-Syn3 is ~512 genes. Well, if you include inter-gene regions, that's about ~2000 cloning reactions, which is actually doable by a single person in a single week, given proper setup.

BioBricks and doing the thing

I recall asking Drew about how BioBricks started. Well, turns out biobricks started because Drew and Tom's labs just started using it. I think this is a key distinction - are the people who are designing AllClo the people actually using it? Or would it be thrust upon the iGEM community? (I'd ask the same question to SBOL as well)

For an easy example, will any of you, in this discussion, actually be using FiveClo or ThreeClo for a project of your own within the next year? If not, why define it now?

The DO definition is cool, but I do have to wonder - can you actually assert that it is "simpler to define a specific DO part type with a unique, AllClo-compatible pair of overhangs"? There is a very good case that people really wouldn't give a shit in most cases, and would just want to be able to switch out the CDS. In that case, it is simpler to have a single dropout part, for example.

Value of the assembly vs of the parts

Still, I think there is value in discussion of assembly, but I do think that it needs to be grounded, in absolute terms, in the parts themselves. What are people going to actually use?

jcahill commented 2 years ago

Wetware jam session for finalizing AllClo: at the latest, February 20.

eyesmo commented 2 years ago

AllClo may be trying to do too many things at once.

This is fair for the structure I outlined above, which is more like an attempt to list all the potential level-0 part types I could imagine and what a logical order for them would be.

Will any of you, in this discussion, actually be using FiveClo or ThreeClo for a project of your own within the next year? If not, why define it now?

This year, of the assemblies and part types outlined above, Friendzymes will be using all the VecClo part types for building B. subtilis and P. pastoris shuttle/expression vectors (in fact, these part types have already been built for yeast and most have been built for B. subtilis). Of FiveClo and ThreeClo, we will use the 5' and 3' homology arms, assembly linkers, and recombinase sites for building genomic integration cassettes that can be selected for with a selection marker, then deleted by expression of an inducible recombinase, and then selected against with a counterselection marker--basically, the core components required for strain engineering. We will also use the ribozyme part type to add insulators to our B. subtilis constructs. We will use the ProClo part types to test secretion tags, track expression with reporters, to try purifying proteins with different affinity tags, and to compare the behavior of the enzymes we make with and without the tags attached (these are already built with ProClo1.0 assembly standard; to reduce the re-design and re-formatting overhead, we're now leaning toward keeping most of these as-is, and only changing the overhangs that cause fidelity issues with the VecClo overhangs in the Open Yeast Collection). We will be using dropout cassettes, possibly with DOclo linkers, not just for CDSs but for secretion tags and possibly homology arms as well. The part types I don't foresee us using this year, which therefore may not need to be defined yet, include distal promoter elements, operators, polyA-tails, and microRNA binding/cleavage sites.

So, simplifying down, the core components of AllClo that we plan to use, and that therefore require definition, are:

Of these, the only ones that are really 'new' (i.e. not already present in FreeGenes toolkits like the Open Yeast Collection or Protein Expression Toolkit) are the 5' & 3' recombinase sites and the ribozymes. Abstracting a bit, the only things that need to be added are optional part types between the 5' linker and promoter, between the promoter and the RBS, and between the terminator and the 3' linker. Like so:

5' Linker---newPart1---promoter---newPart2--RBS---CDS/ProClo---terminator---newPart3--3' Linker

In this way, we can say that newPart1 will require overhangs that match the end of the 5' linker and the start of the Part-Type-Switched, FiveClo-defined promoter; newPart2 will require overhangs that match the end of the PTSed FiveClo promoter and the start of the standard RBS; and newPart3 will require overhangs that match the end of a PTSed ThreeClo terminator and the start of a PTSed ThreeClo 3' linker.

So, we only need to define 2 new overhangs for the FiveClo promoter, and 1 new overhang each for the ThreeClo terminator and 3' linker.

On the Part Type Switching front, we'll want PTS linkers for converting the 5' and 3' end of a promoter from MoClo/uLoop to FiveClo overhangs. I think we'll also want 'null' linkers that can participate in a PTS reaction (i.e. attach to the promoter and then be cloned/transformed or PCR amplified), but that don't change the overhang in the secondary Golden Gate reaction. These null PTS linkers are useful in cases where you only want to change one overhang on a part, for instance if you wanted to add a ribozyme insulator, but didn't want to add a 5' recombinase site, next to a promoter. Null linkers will actually be required for PTSing terminators and 3' linkers, since only one of their overhangs changes in the ThreeClo schema above.