biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

Entrez and Ensembl mappings #39

Closed stuppie closed 5 years ago

stuppie commented 6 years ago

Looking at 10168, it has ensembl mappings to both ENSG00000186448 and ENSG00000281709. However, looking at ensembl and entrez records about these genes: https://www.ncbi.nlm.nih.gov/gene?cmd=retrieve&dopt=default&list_uids=10168 https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000186448 https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000281709

I only see the single 1-to-1 mappings between 10168 and ENSG00000186448. Is this right?

See: https://github.com/SuLab/scheduled-bots/issues/19

sirloon commented 5 years ago

according to ensembl (as of rel 93), file gene_ensembl__xref_entrezgene__dm.txt, we have both associations:

(venv) mygene@su05:/opt/genedoc-hub/ensembl/93$ grep ENSG00000281709 gene_ensembl__xref_entrezgene__dm.txt
9606    ENSG00000281709 110354863
9606    ENSG00000281709 10168
(venv) mygene@su05:/opt/genedoc-hub/ensembl/93$ grep ENSG00000186448 gene_ensembl__xref_entrezgene__dm.txt
9606    ENSG00000186448 110354863
9606    ENSG00000186448 10168
sirloon commented 5 years ago

after regenerating extra mapping file, we still have both:

(venv) mygene@su05:~/mygene.info/src$ grep ENSG00000186448 /opt/genedoc-hub/ensembl/93/gene_ensembl__gene__extra.txt 
ENSG00000186448 10168
(venv) mygene@su05:~/mygene.info/src$ grep ENSG00000281709 /opt/genedoc-hub/ensembl/93/gene_ensemb
l__gene__extra.txt                                                                                
ENSG00000281709 10168
stuppie commented 5 years ago

I guess my question is "where do the mappings come from?", because I don't see it on Entrez's site. But the answer is: the Entrez to Ensembl cross-references come from Ensembl, not from Entrez... Right?

sirloon commented 5 years ago

cross-refs come from both ensembl -> entrez and entrez -> ensembl. I'm still investigating on this 10168 case, I'll get back to you soon.

sirloon commented 5 years ago

I'll write a documentation page with details about how we build that mapping.

For this particular issue, the problem in the end comes from BioMart (we query ensembl Biomart to get ensembl data in general) reporting a association between ENSG00000281709 and 10168, whereas Ensembl website doesn't report the same according to what you've found. We'll contact Ensembl about this issue.

http://uswest.ensembl.org/biomart/martview/f433127fefb5a20fea7d6602ebfb2862?VIRTUALSCHEMANAME=default&ATTRIBUTES=hsapiens_gene_ensembl.default.feature_page.ensembl_gene_id|hsapiens_gene_ensembl.default.feature_page.entrezgene&FILTERS=hsapiens_gene_ensembl.default.filters.ensembl_gene_id."ENSG00000281709,ENSG00000186448"&VISIBLEPANEL=resultspanel

newgene commented 5 years ago

FYI, I asked Ensembl helpdesk about this. Basically, they need to fix the content on the Ensembl gene page to be consistent with the underlying data (i.e. what we retrieved from biomart). from ensembl helpdesk.txt