gnn4dr / DRKG

A knowledge graph and a set of tools for drug repurposing
Apache License 2.0
565 stars 153 forks source link

How to use the entity2src.tsv to query the corresponding concepts #16

Open freshnemo opened 3 years ago

freshnemo commented 3 years ago

Hi, thank for sharing this great task. But I want to ask how can I use entity2src to get the corresponding concepts. For example : Gene::100129669 [Hetionet] Biomedical knowledge graph https://het.io/about/ [STRING] https://string-db.org/, does this mean the identifier 100129669 are the same with Hetionet and STRING?

gurdaspuriya commented 3 years ago

The file entity2src.tsv maps the entities/nodes to the list of data sources they appear in (we use seven different data sources to construct the DRKG). Regarding the IDs, we use the following rules to assign IDs to the entities: (i) Compound entities are mapped to the Drugbank ID and if not possible to the Chembl ID. If a compound can not be found to either of the two we use the native ID space and we include the name of the source as part of the entity’s name. (ii) Gene entities are mapped to the Entrez ID. (iii) Disease entities are mapped to the MESH ID space. (iv) The remaining biological entities appear only in a single data source and hence we use the data source’s ID.