dhimmel / integrate

Scripts and resources to create Hetionet v1.0, a heterogeneous network for drug repurposing
https://doi.org/10.15363/thinklab.4
31 stars 16 forks source link
data-integration drug-repurposing hetionet hetnet neo4j network rephetio

Building hetionet: data integration, hetnet permutation, and Neo4j import

DOI

Hetnets are networks with multiple types of nodes and edges. This repository creates hetionet v1.0, which is a hetnet encoding biology, disease, and pharmacology. We created Hetionet v1.0 for Project Rephetio, a study to systematically evaluate why drugs work and to predict new therapeutic uses for existing drugs. The study describing Project Rephetio and Hetionet v1.0 is:

Systematic integration of biomedical knowledge prioritizes drugs for repurposing
Daniel S Himmelstein, Antoine Lizee, Christine Hessler, Leo Brueggeman, Sabrina L Chen, Dexter Hadley, Ari Green, Pouya Khankhanian, Sergio E Baranzini
eLife (2017-09-22) DOI: 10.7554/eLife.26726

Note: this repository is for building Hetionet v1.0. We recommend that users interested in downloading and using the completed hetnet, do so from the dhimmel/hetionet repository.

Execution

  1. precompile.sh executes notebooks which combine multiple resources into a single type of edge. See the contents of compile for more information.

  2. build.sh builds the hetnet, creates permuted derivatives, and exports the hetnet to Neo4j.

Notebooks

  1. integrate.ipynb creates the hetnet, by integrating data that is stored either in compile or elsewhere on GitHub. All GitHub links use commit hashes to be version specific. The JSON-formatted hetnet is exported to data/hetnet.json.bz2.
  2. permute.ipynb loads the created hetnet and creates permuted derivatives that preserve node degree but destroy edge specificity. The permuted hetnets are written to data/permuted, but are not uploaded due to file size.
  3. neo4j-import.ipynb imports the hetnet and its permutations into separate neo4j instances. These neo4j instances are not uploaded due to file size and licensing issues. Currently, neo4j-community-2.3.3 is used.

Components

Environment

The dependencies are listed in environment.yml, which can be installed on Linux using:

conda env create --file=environment.yml

Activate the environment with source activate integrate.

License

All original content in this repository is released as CC0. However, the hetnet integrates data from many resources and users should consider the licensing of each source. We apply a license attribute on a per node and per edge basis for sources with defined licenses. However, some resources don't provide any license, so for those we've requested permission. More information is available on Thinklab. See licenses/README.md for a table of all resources and their licensing.