Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
213 stars 38 forks source link

Representation of MutationChoice as degenerate sequence #15

Closed simone-pignotti closed 4 years ago

simone-pignotti commented 4 years ago

In order to test several constructs which equally satisfy a problem's constraints and select for the optimal/functional ones, it may be desirable to replace a subsequence with degenerate oligonucleotides. Given the current implementation of the MutationSpace class, it should be fairly easy to provide a toDegenerateSeq method translating the list of choices into such encoding. I can help with the implementation if you are interested in adding this feature, although there's only partial documentation for the MutationSpace and MutationChoice classes. Feel free to close the issue if you don't think it's worth it.

Zulko commented 4 years ago

The problem I see is that DnaChisel's MutationSpaces are more generic than degenerate sequences, and cannot always be represented as degenerate sequences.

Imagine that a codon is constrained by the constraint "this must be a stop codon". Then the mutation space for this triplet will be reduced to TAA, TGA, and TAG. While it could summarized as TRR, this wouldn't be absolutely right, as the notation would also allow "TGG", which is not a stop codon. You will have the same problem for any other constraint creating inter-dependencies between nucleotides ( "no BsmBI sites", "no 9-homopolymers", etc.).

If your goal is to design degenerate sequences to order oligo pools, then you could certainly "project" the solution space into the degenerate sequences space, by finding the "most generic" degenerate sequence whose variants always verify the constraints. But there might be several solutions to your problem. For instance, you might decide to reduce stop codons to either TAR, or TRA, both of which are acceptable.

As I see it, your to_degenerate_seq method could be written "on top of" Chisel, using some of the methods the MutationSpace and OptimizationProblem classes provide. And this may be a bit computationally intensive for cases with many possible variants.

Does that make sense?

simone-pignotti commented 4 years ago

Thanks for the detailed answer and the examples, it definitely makes sense! My bad