SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

Missing uniprot/chebi ids #82

Open skeating opened 6 years ago

skeating commented 6 years ago

Hi Guys

When importing Reactome data I frequently connect to entries from uniprot or chebi. I keep a log of any that are missing. Attached is the log from my latest run.

If there is a better way/more suitable format to submit these let me know.

Sarah wikidata_update_2018-2-26_missing.txt

stuppie commented 6 years ago

The usual way we do this, once everything is set up and running consistently, is to set it up on jenkins, and store the logs there. For example, this is the jenkins job for the disease ontology: http://jenkins.sulab.org/job/Disease_Ontology/ The last run log is here: http://jenkins.sulab.org/job/Disease_Ontology/lastSuccessfulBuild/artifact/scheduled-bots/scheduled_bots/ontology/logs/DOIDBot-20171205_21%3A44.log And an html report from the log: http://jenkins.sulab.org/job/Disease_Ontology/lastSuccessfulBuild/artifact/scheduled-bots/scheduled_bots/ontology/logs/DOIDBot-20171205_21%3A44.html (and these are also linked from the Wikidata Bot's Talk page: https://www.wikidata.org/wiki/User:ProteinBoxBot/Bot_Status)

The log file is parsed and the report generated using: https://github.com/SuLab/scheduled-bots/tree/master/scheduled_bots/logger

But posting that file here is fine for now. I'll to see what's wrong with those..

stuppie commented 6 years ago

I created or fixed 10 of the chebi items For the proteins, there's a bunch of various issues. Some are not in mygene (don't know why), some are in mygene but don't have the uniprot entry linked to the entrez gene, some have xref conflicts in wikidata, and some are Trembl proteins, which we aren't updating in wikidata...

skeating commented 6 years ago

Thanks, I'm not quite in a place to generate reports as above.

What should I do with 'missing' things, other than keep a list of them :-)

stuppie commented 6 years ago

Ideally the bot is set up in Jenkins and we can automatically store the logs. Unless the missing things are breaking something important (i.e. they are needed as a structural link between a large amount of items), I think we can just hold off till then... What is left until the bot is "done"? Just finishing up the protein complexes?

andrawaag commented 6 years ago

The Reactome bot will not be in Jenkins, since it will be included in the publication pipeline in Reactome. i.e. the bot will run each time there is a new release.

stuppie commented 6 years ago

Ah OK. The logger can be changed pretty easily to update a PathwayBot talk page. LMK if this is something you want to do..