SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

Update microbial genes to use chromosomes instead of Refseq Genome ID #71

Closed stuppie closed 6 years ago

stuppie commented 6 years ago

Currently microbial genes use the Refseq Genome ID as a qualifier. image

Changes to be made:

  1. Add Genbank Assembly Accessions to strain items (e.g. https://www.wikidata.org/wiki/Q20800254#P4333)
  2. Create chromosome items for each strain (e.g. https://www.ncbi.nlm.nih.gov/nuccore/NC_010287.1)
  3. Change qualifiers from "Refseq Genome ID" -> ID, to "Chromosome" -> chromosome item

Part 1 Mostly done already: https://github.com/stuppie/wikidatacon_wdi_demo/blob/master/demo.ipynb But need to check to make sure all the reference genomes are covered.

Part 2 To do. Data source: https://www.ncbi.nlm.nih.gov/genome/browse/reference/

Part 3 To do. Requires bot update

@putmantime

stuppie commented 6 years ago

Code: https://github.com/SuLab/scheduled-bots/blob/master/scheduled_bots/geneprotein/MicrobialChromosomeBot.py and https://github.com/SuLab/scheduled-bots/blob/master/scheduled_bots/geneprotein/GeneBot.py

Example: image https://www.wikidata.org/wiki/Q24183407#P644