Proteomicslab57357 / UniprotR

Retrieving Information of Proteins from Uniprot
GNU General Public License v3.0
59 stars 18 forks source link

Which isoform does GetSequences retrieve? #12

Closed vragh closed 3 years ago

vragh commented 3 years ago

Dear maintainers of UniprotR,

Firstly, thank you for writing and maintaining this awesome R package. It really is a lifesaver!!

I have a question about UniprotR::GetSequences. I see that in cases where a sequence has multiple isoforms on UniProt, GetSequences still only returns a single sequence (isoform). My question is, which isoform is this? Is it the canonical isoform (if a canonical isoform has been designated) or one chosen at random? How does GetSequences choose the isoform when no canonical isoform is designated?

MohmedSoudy commented 3 years ago

Dear @vragh Thank you very much for your interest in UniprotR, UniprotR::GetSequences function retrieves the canonical sequence of entry based on its accession as we call the UniProt API using the exact accession entered by the user. We plan in the next update to handle canonical and isoforms as the current version doesn't support isoforms.

vragh commented 3 years ago

Hi Mohmed,

Really appreciate the quick response!!

I see. But what about cases where no canonical isoform is designated (this is the case for a lot of the non-SwissProt entries). I am not too familiar with what UniProt does here. But I presume GetSequences just takes whatever the API designates as the default sequence?

(I suppose I should clarify this with the UniProt folks.)

MohmedSoudy commented 3 years ago

Hi @vragh
Yes, I assume UniprotR::GetSequences will return the first hit from API response but it will be good if you can share with me a list of accessions as we can try a real example.

vragh commented 3 years ago

Hi @MohmedSoudy

I was mistaken. It looks like unreviewed UniProt entries do not have multiple sequences assigned to them (at least as far as I can tell). So for those, GetSequences will always retrieve the representative (and only) sequence. I guess that solves my problem.

But that said, an update enabling the user to specify retrieval of non-canonical isoforms would be super beneficial!

MohmedSoudy commented 3 years ago

Yes @vragh
As I told you, UniprotR:: GetSequences will retrieve the exact accession entered by the user & UniProt assigns unique accession for each entry. Hope our package makes your work easier.

MohmedSoudy commented 3 years ago

@vragh Hi We made an update, UniprotR now supports downloading sequence canonical and isoforms using GetSequenceIso function.