gnn4dr / DRKG

A knowledge graph and a set of tools for drug repurposing
Apache License 2.0
565 stars 153 forks source link

Deduplicate data from different sources #20

Closed yeqing97 closed 2 years ago

yeqing97 commented 3 years ago

Thanks for your sharing! Did you deduplicate data from different sources?

classicsong commented 3 years ago

We deduplicate most of the vertices. But there may still exist some there.

yeqing97 commented 3 years ago

Thank you for your answer!Another question confusing me is why there are no protein nodes in the knowledge graph? Since most biological process are conducted by proteins.

ishaan-mehta commented 2 years ago

Protein-protein interactions have been converted to gene-gene interactions so that there are not separate nodes for a gene and its corresponding protein (since DRKG is compiled from different sources). For example, every interaction in DRKG from STRING was originally a protein-protein interaction.