Closed josiegleeson closed 4 months ago
If you have coordinates that is the fastest, if not, there is no way around matching. Fastest I know of for AAset is data.table::chmatch. Easiest to test is the %in% operator. Btw, matching is much faster on DNA level, as that is encoded in Biostrings as "2 bits" instead of character.
You can also also see if Biostrings have any vmatchpattern for AASet ?
Ok great thanks. I'll have a look into it, I can probably do coordinate matching for the novel transcripts encoding known ORFs as they'll have the same genomic coordinates.
Hi again, I wanted to check if there was an easy way to do this with ORFik before writing something...
Let's say I am searching for ORFs in novel transcripts, but my novel transcript encodes a known ORF in the UniProt fasta, is there an easy way to match these up? I have them both as AAStringSet objects currently (approx. 10k ORFs I want to match with known ones). I can do it by looping through and string matching, but was checking if there is an easier way you know about.
Thank you again for your help!