SuLab / scheduled-bots

GeneWiki Scheduled Bots
MIT License
9 stars 15 forks source link

CGI bot does not run because of duplicates for combination therapies on Wikidata #69

Closed andrawaag closed 2 years ago

andrawaag commented 3 years ago

The CGI bot fails repeatedly. This is caused when duplicate items are created for the same combination therapy. Here are two examples:

buparlisib / paclitaxel / carboplatin combination therapy (Q58644763) and carboplatin / buparlisib / paclitaxel combination therapy ((Q88405264).

The issue is fixed when the two items are merged into one item and the bot runs again.

The issue emerges with the following error message: Traceback (most recent call last): File "normalize_drugs.py", line 136, in <module> assert len(combo_qid) == len(qid_combo)

Identifying which items are the culprit in this case, is a bit tedious. I wrote a script to identify the to be merged items. Running that script identifies the two items. In the future we could include that in the bot, so the bot does not fail, but just fix in the process.

For now, I prefer to do the fixing manually to keep an eye on the merging process.

After emerging the above-mentioned items, the bot ran successfully.

andrawaag commented 3 years ago

Just a reminder to self. The bot failed again. I fixed it by running this script. I then manually merged the duplicate wikidata items on combinatorial therapies. This should be revisited to see if it can be automated.

andrawaag commented 3 years ago

Check if the CGI bot can reference this issue as error output

andrawaag commented 2 years ago

I added the manual script to be part of the both. If the bot runs into this issue it will fix it by merging those items.