SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

check import of FAERS indication data #122

Open andrewsu opened 4 years ago

andrewsu commented 4 years ago

Using the code in https://github.com/SuLab/faers, Greg parsed FAERS data for drug indications and produced this file https://zenodo.org/record/1436000#.XWcVWShKguU. In that file (record 836), there are three diseases listed for the drug bupropion: Depression, Anxiety, Bipolar disorder. However, in the bot run to add FAERS indications, the diff for bupropion only added anxiety: https://www.wikidata.org/w/index.php?title=Q834280&diff=756477246&oldid=737266253

Should investigate why... (and while we're at it, look at automating the parsing and updates...)

(tagging @stuppie in case you remember a reason why this might have been by design...)

gtsueng commented 4 years ago

In looking at the example of bupropion, it appears that the corresponding Wikidata entries for the diseases depression and bipolar disorder do not have the MONDO IDs listed in the FAERS data. No Mondo IDs in Q4340209 or Q131755; and cannot pull any entries up with SPARQL queries for MONDO:0002050 or MONDO:0004985 the way you can for MONDO:0011918. The function 'normalize_to_qids' appears to use MONDO IDs:

mondo_qid = wdi_helpers.id_mapper(PROPS['Mondo ID'])

The FAERs data does have CUIs for the indications, and a quick SPARQL query with the corresponding CUIs for depression and bipolar disorder will successfully pull up the Wikidata entities for the two.

Coverage of the FAERS MONDO IDs in Wikidata compared to the FAERS UMLS CUIs: number of unique umls cuis in FAERS data: 800 number of unique FAERS data umls cuis found in Wikidata via SPARQL: 739 number of unique mondo ids in FAERS data: 774 number of unique FAERS data mondo ids found in Wikidata: 524

No one-to-many nor many-to-one mapping issues were found in pulling Wikidata items with Mondo IDs, which is probably why the script used mondo ids. In contrast, UMLS cuis had one-to-many AND many-to-one mapping issues when used to pull wd entities via sparql.

75 unique FAERS UMLS cuis pulled 156 unique WD entities via SPARQL (one-to-many) 54 unique FAERS UMLS cuis pulled 26 unique WD entities via SPARQL (many-to-one)