SuLab / scheduled-bots

GeneWiki Scheduled Bots
MIT License
9 stars 15 forks source link

Missing some CIViC evidence statements in Wikidata #24

Closed floatingpurr closed 5 years ago

floatingpurr commented 5 years ago

Hi guys, I've just realized that some evidence statements coming from CIViC are not echoed in Wikidata at all. E.g.,

EGFR AMPLIFICATION in Wikidata => 0 statements EGFR AMPLIFICATION in CIViC => 9 accepted evidence statements

ERBB2 AMPLIFICATION in Wikidata => 0 statements ERBB2 AMPLIFICATION in CIViC => 58 accepted evidence statements

Taking a quick look at the bot, it seems that it gets all variants:

r = requests.get('https://civic.genome.wustl.edu/api/variants?count=999999999')

then, it gets statements for each variant, here

for record in tqdm(records):
    try:
        run_one(record['id'], retrieved, fast_run, write, login)
    except Exception as e:
        traceback.print_exc()
        wdi_core.WDItemEngine.log("ERROR", wdi_helpers.format_msg(
            record['id'], PROPS['CIViC Variant ID'], None, str(e), type(e)))

Later it processes data and it loads them in Wikidata. Theoretically speaking, it should get also the aforementioned ones but they are not present.

stuppie commented 5 years ago

Its failing on those because of failure in finding the drug combination items! Looks like there were three of the same item for some reason. https://www.wikidata.org/wiki/Q56240706 https://www.wikidata.org/wiki/Q39136135 https://www.wikidata.org/wiki/Q56240692 I merged them. I'll re-run the bot and it should clear these up. Thanks for your issues!

floatingpurr commented 5 years ago

Great! I hope my issues may help your effort :)

stuppie commented 5 years ago

These have been added now, thanks