Edinburgh-Genome-Foundry / DnaChisel

:pencil2: A versatile DNA sequence optimizer
https://edinburgh-genome-foundry.github.io/DnaChisel/
MIT License
219 stars 40 forks source link

Update biotables.py #38

Closed rfuisz closed 4 years ago

rfuisz commented 4 years ago

proper reverse translation of selenocystine amino acid to the TGA codon

https://en.wikipedia.org/wiki/Selenocysteine#:~:text=Unlike%20other%20amino%20acids%20present,directly%20in%20the%20genetic%20code.&text=The%20UGA%20codon%20is%20made,(SECIS)%20in%20the%20mRNA.

coveralls commented 4 years ago

Coverage Status

Coverage increased (+0.003%) to 88.745% when pulling 1e29a18af88ae249169743746dd4c73a4e027b19 on rfuisz:master into a37daef08feb7948e9e82652bd76dc2364522ac0 on Edinburgh-Genome-Foundry:master.

Zulko commented 4 years ago

Thanks, this is a discovery to me! Apparently there is also a 22nd amino-acid called L-Pyrrolysine now!

That's a decision for @veghp to make, but my two cents is that I am unsure if adding this line here in DNA Chisel is the best way to provide support for selenocystine in DNA Chisel. It won't change the way that TGA will be translated to a stop codon, and TGA will still be considered synonymous with TAG and TAA (which do not encode for selenocystine), so your sequence optimization may loose the selenocystine, and translate(reverse_translate(seq)) would not always be seq anymore. So it's not a straightforward change.

I am not saying that it can't be handled on the Chisel side if necessary, but maybe it's worth pushing for selenocystine support on the biopython project. What they could do on the biopython side is create new custom tables with U support, and you would be able to use these in DNA Chisel.

One quick fix that would not require changes to Chisel or biopython is this at the beginning of your script:

from Bio.Data import CodonTable
CodonTable.unambiguous_dna_by_name['Standard'].back_table['U'] = 'TGA'
CodonTable.unambiguous_dna_by_name['Standard'].forward_table['TGA'] = 'U'

Another fix would be to add a new table "Custom Table" to unambiguous_dna_by_name and then provide that table's name when you use reverse-translate or EnforceTranslation.

Would that work for you?

veghp commented 4 years ago

Yes, I agree that fixing this for selenocysteine would break TGA for those who use it as the standard stop codon. Thanks @Zulko for the proposed alternative solutions.

rfuisz commented 4 years ago

thanks so much for the suggestions. I think that quick fix works just fine -- I'd only be interested in modifying the back_table since I don't want to break TGA as a stop codon in the more general case.