MEETING: Look at Nature Paper and understand how MESH matrix was made. Apply something similar here.

The Support of Human Genetic Evidence....

Question 1: How they got the MESH terms

For OMIM: Used The Comparative Toxicogenomics Database: update 2013 to map the terms to MESH (http://ctdbase.org)

GWASdb: Manually searched the MESH database, and chose the best fit

Question 2: How the scoring was contrived

First note: they didn't score the similarity between the OMIM terms and MESH terms, or the GWAS terms and the MESH terms. It seems like they scored the MESH terms against each other.

Used this algorithm for scoring: UMLS-Interface and UMLS-Similarity : Open Source Software for Measuring Paths and Semantic Similarity Abstract: "A number of computational measures for determining semantic similarity between pairs of biomedical concepts have been developed using various standards and programming platforms. In this paper, we introduce two new open-source frameworks based on the Unified Medical Language System (UMLS). These frameworks consist of the UMLS-Similarity and UMLS-Interface packages. UMLS-Interface provides path information about UMLS concepts. **UMLS-Similarity calculates the semantic similarity between UMLS concepts using several previously developed measures and can be extended to include new measures**. We validate the functionality of these frameworks by reproducing the results from previous work. Our frameworks constitute a significant contribution to the field of biomedical Natural Language Processing by providing a common development and testing platform for semantic similarity measures based on the UMLS."

Uses a suite of Perl modules

Different ways of scoring: Methods: "The Conceptual Distance (Cdist) measure proposed by Rada, et. al. determines the similarity between two concepts by counting the number of edges between them. Its range is between zero and twice the depth of the taxonomy. The similarity measure proposed by Wu & Palmer (wup) is twice the depth of the two concepts least common subsumer (LCS) divided by the product of the depths of the individual concepts. The LCS is the most specific concept two concepts share as an ancestor. Its range is between zero and one. The similarity measure proposed by Leacock & Chodorow (lch) is the negative log of the shortest path between two concepts divided by twice the total depth of the taxonomy. Its range is unbounded. The similarity measure proposed by Nguyen & Al-Mubaid (nam) is the log of two plus the product of the shortest distance between the two concepts minus one and the depth of the taxonomy minus the depth of the concepts LCS. Its range depends on the depth of the taxonomy. The Path measure (path) is the reciprocal of the number of nodes between two concepts and its range is between zero and one."

Some pairs were manually scored (supplementary table 8)

Karine-Moussa / xci-app-1

MEETING: Look at Nature Paper and understand how MESH matrix was made. Apply something similar here. #71

Question 1: How they got the MESH terms

Question 2: How the scoring was contrived