Roleren / ORFik

MIT License
32 stars 9 forks source link

Cross-match ORFik ORFs with known ORFs in UniProt #181

Closed josiegleeson closed 4 months ago

josiegleeson commented 4 months ago

Hi again, I wanted to check if there was an easy way to do this with ORFik before writing something...

Let's say I am searching for ORFs in novel transcripts, but my novel transcript encodes a known ORF in the UniProt fasta, is there an easy way to match these up? I have them both as AAStringSet objects currently (approx. 10k ORFs I want to match with known ones). I can do it by looping through and string matching, but was checking if there is an easier way you know about.

Thank you again for your help!

Roleren commented 4 months ago

If you have coordinates that is the fastest, if not, there is no way around matching. Fastest I know of for AAset is data.table::chmatch. Easiest to test is the %in% operator. Btw, matching is much faster on DNA level, as that is encoded in Biostrings as "2 bits" instead of character.

You can also also see if Biostrings have any vmatchpattern for AASet ?

josiegleeson commented 4 months ago

Ok great thanks. I'll have a look into it, I can probably do coordinate matching for the novel transcripts encoding known ORFs as they'll have the same genomic coordinates.