Bioconductor / Biostrings

Efficient manipulation of biological strings
https://bioconductor.org/packages/Biostrings
57 stars 16 forks source link

translate gives wrong results for triplets TTG and CTG #15

Closed janas-sebastien closed 5 years ago

janas-sebastien commented 6 years ago

Hello,

Since at least Biostring 2.42, the function "translate" gives wrong results for triplets TTG and CTG.

translate(DNAString("TTG")) 1-letter "AAString" instance seq: M

translate(DNAString("CTG")) 1-letter "AAString" instance seq: M

=> Both should be L (leucine).

We think it's related to the erroneous default use of alternative genetic codes.

Cheers,

Seb

janas-sebastien commented 6 years ago

I now understand that it's considered as an alternative start codon, that's why it's translated in M. My issue was that the change in the behavior of the code suddenly broke our unit tests. It would have be better to implement the new behavior as an option to translate. Because now, all people who are using translate to translate a single triplet will see their code broken.

janas-sebastien commented 6 years ago

Moreover, the documentation does not say that the sequence must be a complete sequence (starting at the start codon).

hpages commented 5 years ago

Hi @janas-sebastien

You're right that adding an option would have been better. Sorry for the inconvenience.

This is now done: I just added the no.init.codon argument to translate() (https://github.com/Bioconductor/Biostrings/commit/dcd915c21b96d4d981d4530f913ff9108239e129) and clarified in the documentation that, by default, translate() assumes that the first codon in a DNA or RNA sequence is the initiation codon. I guess it's too late to switch the default back to the old behavior.

The no.init.codon option is in Biostrings 2.50.1 (BioC 3.8, current release) and Biostrings 2.51.1 (BioC 3.9, current devel).

H.

janas-sebastien commented 5 years ago

Thanks for the fix ! Should I close that issue myself ?

hpages commented 5 years ago

Closing it now.