This repository contains code and data related to CAMEOX (CAMEOs eXtended), a parallelized extension of CAMEOS (Constraining Adaptive Mutations using Engineered Overlapping Sequences) developed by LLNL (Lawrence Livermore National Laboratory). The original CAMEOS software was developed by Tom Blazejewski at Wang Lab (Columbia University). CAMEOX is the computational core of the GENTANGLE pipeline for automated design of gene entanglements.
The recommended installation method is as part of the GENTANGLE pipeline by cloning the GENTANGLE repository or, even better, by downloading the Singularity container as this eases the process of setting all the many requirements of CAMEOX, and also the DATANGLE repository to provide data examples and templates. Please see this link for details on these approaches.
git clone https://github.com/BiosecSFA/cameox.git
The main improvements in CAMEOX relative to CAMEOS are:
CAMEOX improvements over CAMEOS have required some changes in the TSV input/parameters file from column 7 regarding CAMEOS. Each line in the file should now have the following columns:
.hmm
file..hmm
file.equal
, rand
, close2mark
, close2deg
(see subsection below for details).Example of a single-line CAMEOX parameter file with Pseudomonas protegens Pf-5 (NCBI taxid: 220664) as host:
output/ aroB_pf5 infA_pf5 jlds/aroB_pf5.jld jlds/infA_pf5.jld hmms/aroB_pf5.hmm hmms/infA_pf5.hmm 20000 p1 0 220664
As indicated above in the input format, the frame parameter in the parameter/input file is a placeholder, both in CAMEOS and CAMEOX. The effective way to select the entanglement frame is via the order of the genes in the input. Using CAMEOS terminology, typically, the "mark" gene is the shorter gene and the "deg" gene is the longer gene. By inverting that order, the effective frame of entanglement regarding the longer gene is changed. CAMEOX is aware of the working entanglement frame and outputs that information at the start of any run to clarify the actual entanglement frame:
Processing entanglement [shorter_prot]⥂[longer_prot] in frame [real_frame]
where [real_frame]
can be either 5'3'F2
or 5'3'F3
.
As previously mentioned, CAMEOS codon optimization is hardwired for E. coli, while CAMEOX includes a generalized embedded codon optimization by reading from an external database. This database is composed by one TSV file for each organism used as host for the entanglements. Each filename follows the format CUT_{taxid}.tsv
, where CUT stands for Codon Usage Table and taxid
is the taxonomic identifier for the organism in the NCBI Taxonomy database. Each TSV file needs two columns: 'codon' for the codons and 'freq' for the frequencies. As an example, please see Pseudomonas protegens Pf-5 (NCBI taxid: 220664) CUT file. The DATANGLE repository also contains the E. coli (NCBI taxid: 562) CUT file direcly usable by CAMEOX.
In case that additional hosts are targeted, a quick method to get the CUT is to consult an online CoCoPUTs service, retrieve the CUT for the desired host with NCBI taxonomic identifier hostTaxId
, and save it with the described format in the file CUT_{hostTaxId}.tsv
, which should be placed in the root of CAMEOX data directory.
As indicated above in the input format, the last parameter indicates the pseudolikelihood (PLL) weights for optimization. Before the MRF optimization (main optimization loop), each gene of each pair of HMM seeds is assigned a weight. Within a pair, the weights sum 1.0
and indicate the relative importance of each gene PLL (as calculated by the respective MRF models) for the total pair score. The options for this parameter are the following:
equal
: The weight will be always equal for both genes (0.5
). So, there is no optimization preference for one over the other regarding the PLL.rand
: For each pair of HMM seed in the population of variants, the weight for one of the genes is randomly obtained from a uniform prob distribution between 0 and 1 so the weight of the other is taken to that both sum 1.0
. Since is very difficult to known a priori the relative importance of both genes for a successful entanglement, this is the preferred choice when working with a large number of variants to be able to better explore the space of solutions and generate a workable Pareto's front. close2mark
: The weight will be always 1.0
for the mark gene and 0.0
for the deg gene, thus optimizing only for the mark gene. This may be useful in extreme entanglement cases where the relative importance of the mark gene is orders of magnitude above the one of the deg gene.close2deg
: The weight will be always 1.0
for the deg gene and 0.0
for the mark gene, thus optimizing only for the deg gene. This may be useful in extreme entanglement cases where the relative importance of the deg gene is orders of magnitude above the one of the mark gene.CAMEOX is part of and released as part of the GENTANGLE pipeline (LLNL-CODE-845475) and is distributed under the terms of the GNU Affero General Public License v3.0 (see LICENSE). CAMEOX is developed upon CAMEOS, which was released under a MIT license (see LICENSE-CAMEOS).
SPDX-License-Identifier: AGPL-3.0-or-later
This work is supported by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Lawrence Livermore National Laboratory Secure Biosystems Design SFA “From Sequence to Cell to Population: Secure and Robust Biosystems Design for Environmental Microorganisms”. Work at LLNL is performed under the auspices of the U.S. Department of Energy at Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
If you use CAMEOX in your research, please cite the following papers. Thanks!
GENTANGLE: integrated computational design of gene entanglements\ Jose Manuel Martí, Chloe Hsu, Charlotte Rochereau, Tomasz Blazejewski, Hunter Nisonoff, Sean P. Leonard, Christina S. Kang-Yun, Jennifer Chlebek, Dante P. Ricci, Dan Park, Harris Wang, Jennifer Listgarten, Yongqin Jiao, Jonathan E. Allen\ bioRxiv 2023.11.09.565696; doi: https://doi.org/10.1101/2023.11.09.565696
Blazejewski T, Ho HI, Wang HH. Synthetic sequence entanglement augments stability and containment of genetic information in cells. Science. 2019 Aug 9;365(6453):595-8. https://doi.org/10.1126/science.aav5477