compbio-UofT / medsavant

MedSavant is a search engine for genetic variants
22 stars 9 forks source link

Update phenotype to gene mapping, and recode to accept new format #338

Open mfiume opened 10 years ago

mfiume commented 10 years ago

The mapping between HPO IDs and genes has been updated, and the format has changed.

The new format is here: http://compbio.charite.de/hudson/job/hpo.annotations.monthly/lastStableBuild/artifact/annotation/ALL_SOURCES_ALL_FREQUENCIES_phenotype_to_genes.txt

The old format is here: http://medsavant.com/serve/ontology/phenotype_to_genes.txt

And the SQL format that we require is:

CREATE TABLE ontology ( ontology varchar(10) COLLATE latin1_bin NOT NULL, id varchar(30) COLLATE latin1_bin NOT NULL, name varchar(300) COLLATE latin1_bin NOT NULL, def mediumtext COLLATE latin1_bin, alt_ids varchar(300) COLLATE latin1_bin DEFAULT NULL, parents varchar(120) COLLATE latin1_bin DEFAULT NULL, genes mediumtext COLLATE latin1_bin, PRIMARY KEY (id) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_bin;

Where genes is a pipe (|) delimited string of genes for the term with name. See the ontology table in any MedSavant database for an example.