Knowledge-Graph-Hub / kg-covid-19

An instance of KG Hub to produce a knowledge graph for COVID-19 response.
https://github.com/Knowledge-Graph-Hub/kg-covid-19/wiki
BSD 3-Clause "New" or "Revised" License
79 stars 26 forks source link

Normalize Drug nodes to single prefix #437

Closed caufieldjh closed 2 years ago

caufieldjh commented 2 years ago

Describe the bug

Drug nodes in KG-COVID-19 have a variety of prefix types, which makes it difficult to reconcile identical nodes. Ideally, all should be normalized to a single prefix (i.e., DrugCentral).

To Reproduce

$ grep 'biolink:Drug' merged-kg_nodes.tsv | awk -F":" '{print $1}' | sort | uniq
CHEBI
CHEMBL.COMPOUND
DRUGBANK
DrugCentral
PHARMGKB
ttd.drug

Expected behavior

Instances of biolink:Drug should have their prefixes normalized to DrugCentral at ingest. This can be done through a SSSOM map similar to that for KG-IDG (https://github.com/Knowledge-Graph-Hub/kg-idg/blob/master/maps/drugcentral-maps-0.1.sssom.tsv) through some source-specific ID curation may be necessary for this KG.

CHEBI is ingested as a full ontology, so it may make sense to retain its CHEBI prefixes rather than attempting to remap all of them to an external database. Instead, we can define Association relations between CHEBI nodes and corresponding DrugCentral nodes.