SuLab / GeneWikiCentral

GeneWiki Organization
MIT License
5 stars 2 forks source link

Load MGI allele data for diseases and phenotypes #47

Open andrewsu opened 7 years ago

andrewsu commented 7 years ago

After talking with Judy Blake at ISMB, I think we have an agreement in principle that we should get more MGI data into Wikidata. Next step is to actually prototype a few of their records as Wikidata items. MGI download files are here: http://www.informatics.jax.org/downloads/reports/index.html. I think we initially should focus on their ortholog mappings and disease annotations. (and phenotype annotations to Mammalian Phenotype Ontology -- what is the licensing?)

andrewsu commented 6 years ago

Good starting point might be this mouse allele for Col2a1: http://www.informatics.jax.org/allele/MGI:2152978

2018-02-22_23-28-38

This mouse allele is a model for 2 human diseases, which can be found in http://www.informatics.jax.org/downloads/reports/MGI_Geno_DiseaseDO.rpt

$ grep MGI:2152978 MGI_Geno_DiseaseDO.rpt  | gawkt '{print $8}' | sort -u
DOID:0080056
DOID:14789

This mouse allele also has multiple annotated phenotypes, available from either http://www.informatics.jax.org/downloads/reports/MGI_Geno_DiseaseDO.rpt OR http://www.informatics.jax.org/downloads/reports/MGI_GenePheno.rpt. (The MP phenotypes are annotated to the exact allelic composition and genetic background, but at least initially, I think we should annotate these directly on the allele item.)

$ grep MGI:2152978 MGI_GenePheno.rpt | gawkt '{print $5}' | sort -u | wc
     54      54     594
$ grep MGI:2152978 MGI_GenePheno.rpt | gawkt '{print $5}' | sort -u | head
MP:0000063
MP:0000065
MP:0000088
MP:0000111
MP:0000130
MP:0000131
MP:0000133
MP:0000141
MP:0000150
MP:0000163
$ grep MGI:2152978 MGI_Geno_DiseaseDO.rpt | gawkt '{print $5}' | sort -u | wc
     48      48     528
$ grep MGI:2152978 MGI_Geno_DiseaseDO.rpt | gawkt '{print $5}' | sort -u | head
MP:0000063
MP:0000065
MP:0000088
MP:0000111
MP:0000130
MP:0000131
MP:0000133
MP:0000141
MP:0000150
MP:0000163
andrewsu commented 6 years ago

On second thought, perhaps this allele http://www.informatics.jax.org/allele/MGI:1856966 would be easier if we're going to start with manual creation of items -- many fewer phenotypes to have to deal with... 2018-02-23_8-50-03

andrewsu commented 6 years ago

... and on third thought, perhaps an even better candidate is http://www.informatics.jax.org/allele/MGI:3620011, because unlike the last example, the associated human disease in this case (obesity) has many linked drugs and gene associations in Wikidata. Will lead to better integrative queries... It has more phenotypes to deal with, but still a reasonable number (16)...

2018-02-23_9-02-39

stuppie commented 6 years ago

Re: symptoms/phenotypes, see also: https://github.com/SuLab/GeneWikiCentral/issues/26

stuppie commented 6 years ago

Data Modelling doc: https://docs.google.com/document/d/1GP9HSyHI0hznoUIgqPWti_MTjsgl9ARLvl7H8JZa4Tw/edit

stuppie commented 6 years ago

Minimalist model (gene to phenotype)

Gene Dbh (MGI:94864) -> has phenotype (RO_0002200) -> increased circulating corticosterone level
    qualifier: observed in –> Genetic Mutant "MGI:2175826" (string)
    curator: MGI
    Stated in: PMID:15724149
    reference URL: http://www.informatics.jax.org/marker/phenotypes/MGI:94864  

Gene Dbh (MGI:94864) -> is model of (RO_0003301) -> "dopamine beta-hydroxylase deficiency" (DOID:0090145)
    curator: MGI
    stated in: PMID:7715704
    reference URL: http://www.informatics.jax.org/marker/MGI:94864