Handling MESH-CHEBI Mappings

callahantiff commented 3 years ago

TASK

Task Type: CODEBASE

Decide how to handle MESH to CHEBI mappings. Currently there is a GitHub Gist (ncbo_rest_api.py) that pings the BioPortal API into a script that can be run as part of the KG CI/CD build.

Problems: The ncbo_rest_api.py script runs fine, but it's brittle given its reliance on the BioPortal API, which is notoriously unstable. A potential solution (for now or in the future) could be implement the LOOM algorithm which is what creates the mappings underlying the API.

TODO

[x] Decide whether or not to use current script or implement LOOM
[x] Convert Gist to script
[x] ~~Write tests against script~~
[x] Integrate script into CI/CD workflow (#68)

callahantiff commented 3 years ago

This work impacts issue #72 because of its reference in the associated Jupyter Notebook.

callahantiff commented 3 years ago

@bill-baumgartner - this is complete (will be integrated with PR #81). I followed the details for the LOOM algorithm described on the BioPortal Wiki. It's very simple, just a few methods. Since there is nothing fancy, essentially accomplished through some preprocessing of the input MesH and ChEBI data and performing an inner join to find overlapping concepts.

In a Nutshell: We download the mesh2021.nt data file directly from MeSH and the Flat_file_tab_delimited/names.tsv.gz file directly from ChEBI. Using these files, we have recapitulated the LOOM algorithm implemented by BioPortal when creating mappings between these resources. The procedure is relatively straightforward and utilizes the following information from each resource:

For all MeSH SCR Chemicals, obtain the following information:
- Identifiers: MeSH identifiers
- Labels: string labels using the RDFS:label object property
- Synonyms: track down all synonyms using the vocab:concept and vocab:preferredConcept object properties
For all ChEBI classes, obtain the following information:
- Labels: string labels using the RDFS:label object property
- Synonyms: track down all synonyms using all synonym object properties

You can see details with a description in the notebook here under ChEBI Identifiers as well as in the scripted version of this notebook (lines: 496-628, here)

callahantiff / PheKnowLator

Handling MESH-CHEBI Mappings #77

TASK

TODO