This tool generates p-values and chi-squared values between curies. These p-values are generated by looking at the correlations between features in the patient/feature ICEES database.
This tool uses two different repositories: 1) Data Services (https://github.com/RENCI-AUTOMAT/Data_services) 2) Plater (https://github.com/TranslatorSRI/Plater)
The Data tools repo is used to help generate tsv files that define the nodes and edges of a graph. In this graph, the curies are the nodes, and the edges link all the nodes and include a "p_value" property. The plater tool is used to create a neo4j database that has the curies as nodes and p_value properties attached to the edges between the nodes.
These scripts were tested and assumptions were written for macOSX and python3.9, but should work on Ubuntu and other versions of python. If an older version of python is to be used, it's likely that the version numbers of some of the packages need to be downgraded in the ./requirements.txt file.
This should work with windows as well, but file paths will need to be modified.
This section will describe the steps that need to be performed before any of the p-values are computed or the NEO4j databases are created.
First, create, activate, and update a virutal environment.
python3.9 -m venv <path_to_venv>
source <path_to_venv>/bin/activate
pip install --upgrade pip
Usually
Next, install all requirements needed to run p-value scripts.
cd <path_to_icees_kg_folder>
pip install -r requirements.txt
pip install ./Plater/PLATER --no-dependencies
Follow instructions here https://docs.docker.com/get-docker/ to install docker on your machine.
Create a .env file in the root directory and add the following variables:
DATA_PATH="FILL_THIS_IN" FEATURES_YAML="FILL_THIS_IN" IDENTIFIERS_YAML="FILL_THIS_IN" NODE_NORM="FILL_THIS_IN" NAME_RESOLVER="FILL_THIS_IN" DATASET_NAME="FILL_THIS_IN"
There are two scripts that need to be run to generate tsv values that will be used by PLATER CLI tools to create a neo4j database with curie p-values. Both of these scripts live in the ./tsv_maker/
folder.
First, environment variables need to be set prior to running any tsv_maker scripts.
cd ./tsv_maker
chmod +x ./set_up_test_env.sh
source ./set_up_test_env.sh
Next, json files (node and edge) are created using the make_jsons.py script. Note: This takes a LONG time to run. Best to leave it running over night.
cd ./tsv_maker
python make_jsons.py
The make_jsons.py
script creates two files: p_val_edges.json
and p_val_nodes.json
. These files are then converted to .tsv files with the following script.
python jsons_to_tsv.py
The jsons_to_tsv.py
script creates two files: p_val_edges.tsv
and p_val_nodes.tsv
. These files are used in the next steps to populate the neo4j database
If you want to run everything locally, your local instance of neo4j needs to have the apoc
plugin installed. Run the following command to create the docker image:
sudo docker run -d --name icees_kg \
-p 7474:7474 \
-p 7687:7687 \
-e NEO4J_AUTH=neo4j/test \
-e NEO4JLABS_PLUGINS=\[\"apoc\"\] \
-e NEO4J_apoc_export_file_enabled=true \
-e NEO4J_apoc_import_file_enabled=true \
-e NEO4J_apoc_import_file_use__neo4j__config=true \
-v $PWD/data:/data \
-v $PWD/backups:/backups \
neo4j:4.2
Now you have a neo4j database up and running, PLATER can be used to spin-up a TRAPI API.
Navigate to the Plater folder and run the main plater script.
cd ../Plater
chmod +x main.sh
./main.sh
If you would like to use a neo4j database on another port, with a different name, or different password, modify the .env file in the plater folder.
This spins up a PLATER api that can be accessed at port 8080 (as defined in the .env file). The API documentation can be found at http://localhost:8080/docs.
First, create neo4j database docker container
sudo docker run -d --name icees_kg \
-p 7474:7474 \
-p 7687:7687 \
-e NEO4J_AUTH=neo4j/test \
-v $PWD/data:/data \
-v $PWD/backups:/backups \
neo4j:4.2
Second, use kgx to populate the neo4j database with the .tsv files created in the previous section. NOTE: This will take ~ 1 hour to run.
kgx neo4j-upload --uri bolt://localhost:7687 --username neo4j --password test --input-format tsv ./build/p_val_nodes.tsv ./build/p_val_edges.tsv
In order to dump the database, the docker container needs to be stopped:
docker stop icees_kg
To dump the database, run:
sudo docker run -i -t --rm \
-v $PWD/data:/data \
-v $PWD/backups:/backups \
--entrypoint /bin/bash \
neo4j:4.2
This will open a terminal in the neo4j image. Then you need to run this command:
neo4j-admin dump --to=/backups/icees_kg.dump