Closed stuppie closed 5 years ago
according to ensembl (as of rel 93), file gene_ensembl__xref_entrezgene__dm.txt, we have both associations:
(venv) mygene@su05:/opt/genedoc-hub/ensembl/93$ grep ENSG00000281709 gene_ensembl__xref_entrezgene__dm.txt
9606 ENSG00000281709 110354863
9606 ENSG00000281709 10168
(venv) mygene@su05:/opt/genedoc-hub/ensembl/93$ grep ENSG00000186448 gene_ensembl__xref_entrezgene__dm.txt
9606 ENSG00000186448 110354863
9606 ENSG00000186448 10168
after regenerating extra mapping file, we still have both:
(venv) mygene@su05:~/mygene.info/src$ grep ENSG00000186448 /opt/genedoc-hub/ensembl/93/gene_ensembl__gene__extra.txt
ENSG00000186448 10168
(venv) mygene@su05:~/mygene.info/src$ grep ENSG00000281709 /opt/genedoc-hub/ensembl/93/gene_ensemb
l__gene__extra.txt
ENSG00000281709 10168
I guess my question is "where do the mappings come from?", because I don't see it on Entrez's site. But the answer is: the Entrez to Ensembl cross-references come from Ensembl, not from Entrez... Right?
cross-refs come from both ensembl -> entrez and entrez -> ensembl. I'm still investigating on this 10168 case, I'll get back to you soon.
I'll write a documentation page with details about how we build that mapping.
For this particular issue, the problem in the end comes from BioMart (we query ensembl Biomart to get ensembl data in general) reporting a association between ENSG00000281709 and 10168, whereas Ensembl website doesn't report the same according to what you've found. We'll contact Ensembl about this issue.
FYI, I asked Ensembl helpdesk about this. Basically, they need to fix the content on the Ensembl gene page to be consistent with the underlying data (i.e. what we retrieved from biomart). from ensembl helpdesk.txt
Looking at 10168, it has ensembl mappings to both ENSG00000186448 and ENSG00000281709. However, looking at ensembl and entrez records about these genes: https://www.ncbi.nlm.nih.gov/gene?cmd=retrieve&dopt=default&list_uids=10168 https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000186448 https://uswest.ensembl.org/Homo_sapiens/Gene/Summary?g=ENSG00000281709
I only see the single 1-to-1 mappings between 10168 and ENSG00000186448. Is this right?
See: https://github.com/SuLab/scheduled-bots/issues/19