Open gtsueng opened 5 years ago
@gtsueng yes, wikipedia data source in mygene.info is not automatically updated. It's still the initial version we loaded at the time we created the parser for it.
It's probably good timing to get the proper "dumper" (for auto pulling data from the src) setup for Wikipedia data source, based on the code snippet you provided.
Currently, about ~1500 human genes in Wikidata have corresponding Wikipedia URLs in English Wikipedia, but MyGene.info does not return these results (ie- it's missing in MyGene).
For example, NRN1 (https://www.wikidata.org/wiki/Q18040171) is linked to https://en.wikipedia.org/wiki/NRN1 in Wikipedia, but MyGene.info (https://mygene.info/v3/gene/51299?fields=wikipedia%2C%20symbol) does not give the url.
SPARQL query in python for pulling human genes with English Wikipedia links in this gist
python notebook for pulling the missing urls