biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
117 stars 20 forks source link

missing Wikipedia URLs in MyGene.info #71

Open gtsueng opened 5 years ago

gtsueng commented 5 years ago

Currently, about ~1500 human genes in Wikidata have corresponding Wikipedia URLs in English Wikipedia, but MyGene.info does not return these results (ie- it's missing in MyGene).

For example, NRN1 (https://www.wikidata.org/wiki/Q18040171) is linked to https://en.wikipedia.org/wiki/NRN1 in Wikipedia, but MyGene.info (https://mygene.info/v3/gene/51299?fields=wikipedia%2C%20symbol) does not give the url.

SPARQL query in python for pulling human genes with English Wikipedia links in this gist

python notebook for pulling the missing urls

newgene commented 5 years ago

@gtsueng yes, wikipedia data source in mygene.info is not automatically updated. It's still the initial version we loaded at the time we created the parser for it.

It's probably good timing to get the proper "dumper" (for auto pulling data from the src) setup for Wikipedia data source, based on the code snippet you provided.