Closed Lcarey closed 3 months ago
I think the issue is that you're using RepeatedKmerPattern which is for direct repeats. Try UniquifyAllKmers.
See this example: https://github.com/Edinburgh-Genome-Foundry/DnaChisel/blob/4be08c2696710387ae6d5b0346e6680a69d88c64/examples/common_scenarios/circular_sequence.py#L25
Peter is right, from your definition of longestRepetitiveSubstring
it looks like you want UniquifyAllKmers(9)
(note that by default it also looks for homologies on the reverse-complement).
Beware that this is one of the rare specifications in DnaChisel that can fail for complicated problems even though these problems may have indeed a solution (this is particularly true when the initial sequence already contains a lot of repeated elements). Some tips:
If UniquifyAllKmers(N)
fails as a constraint (with a NoSolutionError
) you could try running it as an objective: objectives=[CodonOptimize(...), UniquifyAllKmers(N, boost=100)]
. You can also try shaking up the initial sequence with reverse_translate(protein_sequence, randomize_codons=True)
. It can make things better, or worse. Last resort, increase the k-mer length.
Hi Peter & Zulko,
Perfect. Thanks for the rapid response. like always with your code, I should have read the documentation
have a good evening, and thank you so much!
-Lucas
Hi, I'm using DNAchisel to encode some difficult to synthesize proteins. I tried removing all kmer repeats, and constraints_text_summary() says pass, but I'm able to find a repeat. What am I not understanding about how to codon optimize?
for example, in the below code, there's a 26nt repeat that doesn't get removed.
thanks for helping me out! -Lucas
output:
code: