Open ftwkoopmans opened 2 years ago
Just to add a tiny bit more info. I suspect the difference in behavior between P63044
and P23819
is due to the lack of an Entrez Gene mapping in the UniProt file for P23819
.
The source file for the uniprot data plugin appears to be https://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/idmapping_selected.tab.gz.
From the README, the column headings for this file are as follows:
1. UniProtKB-AC
2. UniProtKB-ID
3. GeneID (EntrezGene)
4. RefSeq
5. GI
6. PDB
7. GO
8. UniRef100
9. UniRef90
10. UniRef50
11. UniParc
12. PIR
13. NCBI-taxon
14. MIM
15. UniGene
16. PubMed
17. EMBL
18. EMBL-CDS
19. Ensembl
20. Ensembl_TRS
21. Ensembl_PRO
22. Additional PubMed
Note the difference in the records below in column 3 which should have a mapping to Entrez Gene.
$ gzip -cd idmapping_selected.tab.gz | awk '$1=="P63044"' | tr "\t" "\n" | cat -n | head
1 P63044
2 VAMP2_MOUSE
3 22318
4 NP_033523.1
5 51704193; 6678551
6
7 GO:0030136; GO:0060203; GO:0005737; GO:0031410; GO:0030659; GO:0030285; GO:0043231; GO:0043229; GO:0016020; GO:0043005; GO:0044306; GO:0048471; GO:0005886; GO:0030141; GO:0030667; GO:0031201; GO:0000322; GO:0045202; GO:0008021; GO:0030672; GO:0070044; GO:0070032; GO:0070033; GO:0005802; GO:0031982; GO:0042589; GO:0048306; GO:0005516; GO:0042802; GO:0017022; GO:0005543; GO:0008022; GO:0044877; GO:0005484; GO:0000149; GO:0019905; GO:0017075; GO:0044325; GO:0017156; GO:0032869; GO:0043308; GO:0098967; GO:0043001; GO:0046879; GO:0060291; GO:0061025; GO:0090316; GO:0015031; GO:0065003; GO:0045055; GO:0017158; GO:1902259; GO:0017157; GO:1903421; GO:0060627; GO:0009749; GO:0035493; GO:0016081; GO:0048488; GO:0016079; GO:0006906; GO:0016192
8 UniRef100_P63044
9 UniRef90_P63044
10 UniRef50_P63044
$ gzip -cd idmapping_selected.tab.gz | awk '$1=="P23819"' | tr "\t" "\n" | cat -n | head
1 P23819
2 GRIA2_MOUSE
3
4
5 496139; 22096313; 26335713; 496140; 12852206
6 7LDD:B; 7LDD:D; 7LDE:B; 7LDE:D; 7LEP:B; 7LEP:D
7 GO:0032281; GO:0032279; GO:0009986; GO:0030425; GO:0032839; GO:0043198; GO:0043197; GO:0005783; GO:0005789; GO:0098978; GO:0030426; GO:0005887; GO:0099061; GO:0099055; GO:0099056; GO:0016020; GO:0043005; GO:0043025; GO:0043204; GO:0099544; GO:0005886; GO:0014069; GO:0098839; GO:0045211; GO:0042734; GO:0032991; GO:0098685; GO:0036477; GO:0045202; GO:0097060; GO:0008021; GO:0030672; GO:0043195; GO:0004971; GO:0001540; GO:0051117; GO:0008092; GO:0005234; GO:0035254; GO:0042802; GO:0019865; GO:0004970; GO:0015277; GO:0015276; GO:0030165; GO:0019901; GO:0038023; GO:0000149; GO:1904315; GO:0007268; GO:0045184; GO:0035235; GO:0050806; GO:0051262; GO:0031623; GO:0001919; GO:0051966
8 UniRef100_P23819
9 UniRef90_P19491-3
10 UniRef50_P19491
This difference can also be seen on the corresponding UniProt web pages
Having said that, the reciprocal links do exist in NCBI Gene (likely through a mapping to Refseq Protein):
Some uniprot accessions are not available for querying nor as output in the "uniprot" field/scope. To illustrate I've included 2 examples, one accession that works (P63044) and one that fails (P23819).
this works via https://mygene.info/v3/api#/query/get_query ; "q" input: P63044 "fields" input: symbol,name,taxid,entrezgene,uniprot
returns:
this works via https://mygene.info/v3/api#/query/get_query ; in "q" input: P23819 in "fields" input: symbol,name,taxid,entrezgene,uniprot
and returns:
However, note that for the latter query, the uniprot input ID that I queried (a swissprot record) is not included in the "uniprot" output field! So it seems there is a problem with the mygene.info database, possibly a subset of uniprot accessions/IDs are not stored/linked under "uniprot". Other examples are P23819, Q61941, Q8VHW2.
Furthermore, POST queries against these accessions fail even though they should not (probably same root cause).
this works via https://mygene.info/v3/api#/query/post_query ;
{ "q": "P63044", "scopes": "uniprot" }
returns:this query fails, but it should not as this is a valid uniprot accesion that is in the mygene.info dataset (see GET query above) ;
{ "q": "P23819", "scopes": "uniprot" }
returns: