Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
219 stars 40 forks source link

Solving "high repeat density" in codon optimized sequences #77

Closed snayfach closed 5 months ago

snayfach commented 5 months ago

I'm trying to use DnaChisel to perform codon optimization of sequences that will be accepted by Twist for DNA synthesis. However I get an error "high repeat density" and that "more than 15% of your sequence is composed of repeats 9bp or longer". Is there any work-around for this?

def codon_optimize(my_protein):
    """
    Use dnachisel to perform contrained codon optimization
    See: https://www.twistbioscience.com/faq/gene-synthesis/codon-optimization-what-steps-are-taken-maintain-wild-type-protein-expression
    """
    my_sequence = reverse_translate(my_protein, table='Standard')
    problem = DnaOptimizationProblem(
        sequence=my_sequence,
        constraints=[
            AvoidPattern("GGAGG"), # ribosome binding sites
            AvoidPattern("TAAGGAG"), # ribosome binding sites
            AvoidPattern("TTTTT"), # terminator
            AvoidPattern("AAAAA"), # terminator
            UniquifyAllKmers(12),
            AvoidPattern(HomopolymerPattern("A", 10)), # homopolymer runs of 10 or more bases
            AvoidPattern(HomopolymerPattern("C", 10)), # homopolymer runs of 10 or more bases
            AvoidPattern(HomopolymerPattern("G", 10)), # homopolymer runs of 10 or more bases
            AvoidPattern(HomopolymerPattern("T", 10)), # homopolymer runs of 10 or more bases
            AvoidHairpins(location=[0, 48]), # strong hairpins (∆G <-8) in the first 48 base pairs
            EnforceGCContent(mini=0.35, maxi=0.65, window=50), # local GC windows (50 bp) of less than 35% or more than 65%
            EnforceGCContent(mini=0.25, maxi=0.65, window=len(my_sequence)), # global GC% of less than 25% or more than 65% 
            EnforceTranslation(start_codon="keep"), # keep ATG start codon
            AvoidRareCodons(0.08, 'h_sapiens') # rare codons (those with a codon frequency of <8%)
        ],
        objectives=[
            CodonOptimize(species='h_sapiens', method='use_best_codon')
        ],
        logger=None
    )
    problem.resolve_constraints()
    problem.optimize()
    return problem.sequence
Zulko commented 5 months ago

UniquifyAllKmers(9) should remove all non-unique 9-mers and resolve "more than 15% of your sequence is composed of repeats 9bp or longer", but it's also a very harsh specification to have as a hard constraint. You could also keep UniquifyAllKmers(12) in your constraints, and add UniquifyAllKmers(9) as a soft constraint in objectives.

snayfach commented 5 months ago

Worked like a charm. Thank you!