Open andrewsu opened 7 years ago
Good starting point might be this mouse allele for Col2a1: http://www.informatics.jax.org/allele/MGI:2152978
This mouse allele is a model for 2 human diseases, which can be found in http://www.informatics.jax.org/downloads/reports/MGI_Geno_DiseaseDO.rpt
$ grep MGI:2152978 MGI_Geno_DiseaseDO.rpt | gawkt '{print $8}' | sort -u
DOID:0080056
DOID:14789
This mouse allele also has multiple annotated phenotypes, available from either http://www.informatics.jax.org/downloads/reports/MGI_Geno_DiseaseDO.rpt OR http://www.informatics.jax.org/downloads/reports/MGI_GenePheno.rpt. (The MP phenotypes are annotated to the exact allelic composition and genetic background, but at least initially, I think we should annotate these directly on the allele item.)
$ grep MGI:2152978 MGI_GenePheno.rpt | gawkt '{print $5}' | sort -u | wc
54 54 594
$ grep MGI:2152978 MGI_GenePheno.rpt | gawkt '{print $5}' | sort -u | head
MP:0000063
MP:0000065
MP:0000088
MP:0000111
MP:0000130
MP:0000131
MP:0000133
MP:0000141
MP:0000150
MP:0000163
$ grep MGI:2152978 MGI_Geno_DiseaseDO.rpt | gawkt '{print $5}' | sort -u | wc
48 48 528
$ grep MGI:2152978 MGI_Geno_DiseaseDO.rpt | gawkt '{print $5}' | sort -u | head
MP:0000063
MP:0000065
MP:0000088
MP:0000111
MP:0000130
MP:0000131
MP:0000133
MP:0000141
MP:0000150
MP:0000163
On second thought, perhaps this allele http://www.informatics.jax.org/allele/MGI:1856966 would be easier if we're going to start with manual creation of items -- many fewer phenotypes to have to deal with...
... and on third thought, perhaps an even better candidate is http://www.informatics.jax.org/allele/MGI:3620011, because unlike the last example, the associated human disease in this case (obesity) has many linked drugs and gene associations in Wikidata. Will lead to better integrative queries... It has more phenotypes to deal with, but still a reasonable number (16)...
Re: symptoms/phenotypes, see also: https://github.com/SuLab/GeneWikiCentral/issues/26
Minimalist model (gene to phenotype)
Gene Dbh (MGI:94864) -> has phenotype (RO_0002200) -> increased circulating corticosterone level
qualifier: observed in –> Genetic Mutant "MGI:2175826" (string)
curator: MGI
Stated in: PMID:15724149
reference URL: http://www.informatics.jax.org/marker/phenotypes/MGI:94864
Gene Dbh (MGI:94864) -> is model of (RO_0003301) -> "dopamine beta-hydroxylase deficiency" (DOID:0090145)
curator: MGI
stated in: PMID:7715704
reference URL: http://www.informatics.jax.org/marker/MGI:94864
After talking with Judy Blake at ISMB, I think we have an agreement in principle that we should get more MGI data into Wikidata. Next step is to actually prototype a few of their records as Wikidata items. MGI download files are here: http://www.informatics.jax.org/downloads/reports/index.html. I think we initially should focus on their ortholog mappings and disease annotations. (and phenotype annotations to Mammalian Phenotype Ontology -- what is the licensing?)