Roleren / ORFik

MIT License
33 stars 9 forks source link

findORFsFasta() error: identifying wrong start codon & stop codon #124

Closed chenghongdeng closed 1 year ago

chenghongdeng commented 2 years ago

Hi,

I am trying to run findORFsFasta() on a local fasta file.

I loaded my fasta file into a data frame. _fastaFile <- readDNAStringSet("selected.fa") seq_name = names(fastaFile) sequence = paste(fastaFile) df <- data.frame(seq_name, sequence)

seq <- DNAStringSet(df$sequence)_ names(seq) <- df$seq_name

The I run findORFsFasta() function by following command: _orfs <- findORFsFasta( seq, startCodon = 'atg', stopCodon = "TAA|TAG|TGA", #https://rdrr.io/bioc/ORFik/src/R/find_ORFs.R longestORF = TRUE,

minimumLength = 0,

is.circular = FALSE

)_

I also tried to specify the start codon and stop codon by using the following command: startCodon = startDefinition(1) stopCodon = stopDefinition(1)

Both ways give my the same output. By taking a close look at my output file, I find that it not only recognize the ATG as the start codon, but also recognize the CTG as the start codon. image The sequence highlight in color are some ORFs identified by findORFsFasta().

I am really confused right now and not sure how to solve this problem. Thanks in advance, Chenghong

Roleren commented 2 years ago

Hey, Yeah, I think this is just confusion about function input. ORFik uses case sensitive start codon search, so what you claim above can not happen.

startCodon = "atg" and startCodon = startDefinition(1) will never give the same answer.

So if you want only ORFs with capital letters ATG do: findORFsFasta("selected.fa", startCodon = "ATG")

Let me know if that gives you what you want :)