Closed harijay closed 4 years ago
Very sorry for that! The reason for what you observe is that EnforceTranslation()
, which is responsible for conserving the protein sequence, should be used as a constraint, not as an optimization objective.
Only specifications passed as constraints have a guarantee to be verified in the end. Everything passed as objectives will be only enforced as much as other constraints and competing objectives allow.
So your script should look like this:
problem = DnaOptimizationProblem(
sequence=reverse_translate(mysequence),
constraints=[
AvoidPattern("SapI_site"),
AvoidPattern("BbsI"),
EnforceTranslation()
],
objectives=[
CodonOptimize(species="e_coli"),
AvoidNonUniqueSegments(12),
EnforceGCContent(0.25, 0.70, window=10),
AvoidHairpins()
]
)
There is already an CodonOptimize example but as you are not the first person to meet this problem I will make sure that this is clear throughout the library (maybe I'll also raise a warning at solve time if CodonOptimize is used without an EnforeTranslation constraint).
I Let me know if that clarifies it.
Thank you so much for the clarification of the key difference between constraints and optimization objectives. I incorrectly assumed that they were almost interchangeable because EnforceTranslation "worked" as an objective as well. Also, I had missed the very clear usage example in the CodonOptimize example you pointed me to.
Your proposal to raise a warning at solve time for absence of an EnforceTranslation constraint if a CodonOptimize objective is used makes great sense.
Thanks a ton for the quick response and correction--I will use EnforceTranslation as a constraint from now on.
No problem, please reach out if you see other troubles!
I really like Dnachisel and use it routinely to design protein expression constructs for E.coli. I recently got tripped up on an optimization of a ~3kb construct where I realized too late ( after the synthetic gene was ordered ) that the way I was doing my optimization mutated my resulting sequence.
I am doing my optimization using the recipe
The translated sequence that results has 1-2 mutations in the ~1000 amino acid sequence. I have tried it with the github version and the pip version of Dnachisel and they both do the same. There does not seem to be a pattern in where the mutations arise.
I have since learnt my lesson and am checking my designs to ensure that they translate to input amino acid sequence. But am hoping there is a way to not have the optimization spit out a mutated sequence and fail to converge.
Attaching a collaboratory notebook link and file showing what I did.
Thanks for your help in advance