Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
213 stars 38 forks source link

Allow unknown amino acids denoted by 'X' in `reverse_translate` function #74

Closed lauraluebbert closed 1 year ago

lauraluebbert commented 1 year ago

It would be great if the reverse_translate function could handle unknown amino acids in the sequence denoted by 'X', which would be reverse translated to 'NNN'.

Zulko commented 1 year ago

Thanks for this and the PR!

I like how the sets for B,J,Z are built as unions of other (unambiguous) codons. Note that the reverse-translation in these cases will not be degenerate, it will pick one possible codon among all unambiguous possibilities (either deterministically or randomly depending on the reverse_translate parameters).

For X however this will introduce degenerate nucleotides N into the framework, which is a bit trickier. And beware that X excludes stop codons TGA TAG TAA and so is not strictly reverse-translatable to NNN. An simpler support for X would be "the union of all other amino-acid codon sets", and then the reverse translation would pick one (deterministically or randomly).

Would this suit your use case? Is it for use in DnaChisel or are you using reverse_translate() independently? Asking because I think there are ways to make DnaChisel "generate anything that's a X (or a B or Z) in the final sequence".

lauraluebbert commented 1 year ago

Thanks for the quick response!

You're right; that is a better solution for X. I adjusted my PR accordingly.

I am currently only using the reverse_translate() function independently

veghp commented 1 year ago

Merging was delayed due to another PR. Added a bit of test to ensure all future version can handle ambiguous aa. Thanks for the contribution.