Closed FelixErnst closed 5 years ago
A sorry my bad… rtm to myself
Good afternoon,
I recently tracked down an error in my workflow to some non-intuitive behavior in Biostrings::translate()
function, hacked a work-around and then saw you updated the fix already. I wrote this up, thought I'd leave it here if someone runs into this.
In essence, when I was translating a short cDNA fragment CTGACGCGAGCAGCCAAG
, it was reading the CTG
as a non-standard initiation site, the resulting peptide was MTRAAK
.
I was matching this against a trypsin digested peptide library for mass-spec, the trypsin fragment was LTRAAK
, and not MTRAAK
.
The standard GENETIC_CODE
has the attribute alt_init_codons
== "TTG" "CTG"
So I could hack around alternative initiation by setting
# From
# GENETIC_CODE_TABLE$Starts[1] = "---M---------------M---------------M----------------------------"
# attr(GENETIC_CODE, "alt_init_codons") = c("TTG", "CTG")
#To
GENETIC_CODE_TABLE$Starts[1] = "-----------------------------------M----------------------------"
attr(GENETIC_CODE, "alt_init_codons") = "ATG"
and running translate with the modified GENETIC_CODE
table
translate(cDNA.fragment, genetic.code = GENETIC_CODE)
I believe in the updated version it should simply be
translate(cDNA.fragment, no.init.codon = T)
Hi Artem @ababaian,
Please have a look at the manual. In the current version
translate(cDNA.fragment, no.init.codon = FALSE)
Behaves like you want it to. I would avoid touching GENETIC_CODE_TABLE
Felix
@FelixErnst I think @ababaian wants to use no.init.codon=TRUE
here:
> library(Biostrings)
> translate(DNAString("CTGACGCGAGCAGCCAAG"), no.init.codon=TRUE)
6-letter "AAString" instance
seq: LTRAAK
@ababaian Looks like you figured this out already. Didn't you?
@hpages @ababaian ah sorry for the mix up.
I meant no.init.codon=TRUE
, which was the solution for my initial "problem", which works out of the box with any modifications of GENETIC_CODE_TABLE
I'm good yes, my version of Biostrings just doesn't have the no.init.codon
flag so I have the work-around. =D
mmhh... no.init.codon
was introduced in Bioconductor 3.8. So you are using a version of Bioconductor that is old and not supported! I would strongly recommend that you update to the most recent version (3.10) released this week! See https://bioconductor.org/news/bioc_3_10_release/
Thanks very much for these insights! I do think most people would never dream that start-of-DNAString 'CTG' and 'TTG' triplets would be translated as alternative START codons by default... I wonder if you'd consider changing the default behavior to no.init.codon=TRUE?
Thanks for the feedback.
The best default value for no.init.codon
really depends on your use case: do your DNA sequences represent full CDS sequences or CDS chunks? Today your use case is the latter so you complain loudly about the inadequacy of the default behavior. Surely, if the default behavior was no.init.codon=TRUE
, it would be users with the former use case who would now complain.
I'm not inclined to change the default behavior because:
no.init.codon=TRUE
would not make as many (if not more) people unhappy.?translate
. OK most people don't RTFM but they at least check the examples, especially when they use a function for the first time. In the case of ?translate
, the very first example is exactly about that:
dna1 <- DNAString("TTGATATGGCCCTTATAA")
translate(dna1)
## TTG is an alternative initiation codon in the Standard Genetic Code:
translate(dna1, no.init.codon=TRUE)
so is hard to miss, even if you are too busy to RTFM (but not to tweet).
I like to think that the reason people almost never complained about the current default behavior is because they didn't miss that example.
Thanks very much for your help! It is certainly my fault that I did not read the example — I hope you'll accept my apology for my carelessness. Your reasons for leaving it as-is make good sense, and it's certainly possible that most people know that this is how it works. Thanks again!
I am not sure, that this behavior is intended:
In my opinion
TTG
andCTG
should returnL
, shouldn't they?