Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
213 stars 38 forks source link

Optimization changes encoded sequence-correct way to preserve ? #14

Closed harijay closed 4 years ago

harijay commented 4 years ago

I really like Dnachisel and use it routinely to design protein expression constructs for E.coli. I recently got tripped up on an optimization of a ~3kb construct where I realized too late ( after the synthetic gene was ordered ) that the way I was doing my optimization mutated my resulting sequence.

I am doing my optimization using the recipe

problem = DnaOptimizationProblem(
sequence=reverse_translate(mysequence),
constraints=[AvoidPattern("SapI_site"),AvoidPattern("BbsI")],
objectives=[CodonOptimize(species="e_coli"),EnforceTranslation(),AvoidNonUniqueSegments(12),EnforceGCContent(0.25, 0.70, window=10),AvoidHairpins()],
)

The translated sequence that results has 1-2 mutations in the ~1000 amino acid sequence. I have tried it with the github version and the pip version of Dnachisel and they both do the same. There does not seem to be a pattern in where the mutations arise.

I have since learnt my lesson and am checking my designs to ensure that they translate to input amino acid sequence. But am hoping there is a way to not have the optimization spit out a mutated sequence and fail to converge.

Attaching a collaboratory notebook link and file showing what I did.

Thanks for your help in advance

Zulko commented 4 years ago

Very sorry for that! The reason for what you observe is that EnforceTranslation(), which is responsible for conserving the protein sequence, should be used as a constraint, not as an optimization objective.

Only specifications passed as constraints have a guarantee to be verified in the end. Everything passed as objectives will be only enforced as much as other constraints and competing objectives allow.

So your script should look like this:

problem = DnaOptimizationProblem(
    sequence=reverse_translate(mysequence),
    constraints=[
        AvoidPattern("SapI_site"),
        AvoidPattern("BbsI"),
        EnforceTranslation()
    ],
    objectives=[
        CodonOptimize(species="e_coli"),
        AvoidNonUniqueSegments(12),
        EnforceGCContent(0.25, 0.70, window=10),
        AvoidHairpins()
    ]
)

There is already an CodonOptimize example but as you are not the first person to meet this problem I will make sure that this is clear throughout the library (maybe I'll also raise a warning at solve time if CodonOptimize is used without an EnforeTranslation constraint).

I Let me know if that clarifies it.

harijay commented 4 years ago

Thank you so much for the clarification of the key difference between constraints and optimization objectives. I incorrectly assumed that they were almost interchangeable because EnforceTranslation "worked" as an objective as well. Also, I had missed the very clear usage example in the CodonOptimize example you pointed me to.

Your proposal to raise a warning at solve time for absence of an EnforceTranslation constraint if a CodonOptimize objective is used makes great sense.

Thanks a ton for the quick response and correction--I will use EnforceTranslation as a constraint from now on.

Zulko commented 4 years ago

No problem, please reach out if you see other troubles!