Closed callahantiff closed 3 years ago
This work impacts issue #72 because of its reference in the associated Jupyter Notebook.
@bill-baumgartner - this is complete (will be integrated with PR #81). I followed the details for the LOOM
algorithm described on the BioPortal Wiki. It's very simple, just a few methods. Since there is nothing fancy, essentially accomplished through some preprocessing of the input MesH and ChEBI data and performing an inner join to find overlapping concepts.
In a Nutshell: We download the mesh2021.nt
data file directly from MeSH and the Flat_file_tab_delimited/names.tsv.gz
file directly from ChEBI. Using these files, we have recapitulated the LOOM
algorithm implemented by BioPortal when creating mappings between these resources. The procedure is relatively straightforward and utilizes the following information from each resource:
SCR Chemicals
, obtain the following information:
RDFS:label
object property vocab:concept
and vocab:preferredConcept
object properties RDFS:label
object property synonym
object properties You can see details with a description in the notebook here under ChEBI Identifiers
as well as in the scripted version of this notebook (lines: 496-628, here)
TASK
Task Type:
CODEBASE
Decide how to handle
MESH
toCHEBI
mappings. Currently there is a GitHub Gist (ncbo_rest_api.py
) that pings the BioPortal API into a script that can be run as part of the KG CI/CD build.Problems: The
ncbo_rest_api.py
script runs fine, but it's brittle given its reliance on the BioPortal API, which is notoriously unstable. A potential solution (for now or in the future) could be implement the LOOM algorithm which is what creates the mappings underlying the API.TODO
Write tests against script