OluwadareLab, University of Colorado, Colorado Springs
Developers:
Van Hovenga
Department of Mathematics
University of Colorado, Colorado Springs
Email: vhovenga@uccs.edu
Contact:
Oluwatosin Oluwadare, PhD
Department of Computer Science
University of Colorado, Colorado Springs
Email: ooluwada@uccs.edu
HiC-GNN runs in a Docker-containerized environment. Before cloning this repository and attempting to build, install the Docker engine. To install and build HiC-GNN follow these steps.
git clone https://github.com/OluwadareLab/HiC-GNN.git && cd HiC-GNN
. docker pull oluwadarelab/hicgnn:latest
. This may take a few minutes. Once finished, check that the image was sucessfully pulled using docker image ls
.docker run --rm -it --name hicgnn_cont -v ${PWD}:/HiC-GNN oluwadarelab/hicgnn
. There are three python scripts used in this study. We describe their purposes and usage below.
This script takes a single Hi-C contact map as an input and utilizes it to train a HiC-GNN model.
Inputs:
Outputs:
Outputs/input_filename_structure.pdb
.Outputs/input_filename_log.txt
.Outputs/input_filename_weights.pt
Data/input_filename_matrix_KR_normed.txt
.Data/input_filename_embeddings.txt
. Data/input_filename_matrix.txt
.Usage: python HiC-GNN_main.py input_filepath
positional arguments:
input_filepath
: Path of the input file.
optional arguments:
-h, --help show this help message and exit
-c, --conversions
String of conversion constants of the form '[lowest, interval, highest]' for a set of equally spaced conversion factors, or of the form '[conversion]' for a single conversion factor. Default value: '[.1,.1,2]'
-bs, --batchsize
Batch size for embeddings generation. Default value: 128.
-ep, --epochs
Number of epochs used for embeddings generation. Default value: 10.
-lr, --learningrate
Learning rate for training GCNN. Default value: .001.
-th, --threshold
Loss threshold for training termination. Default value: 1e-8.
Example: python HiC-GNN_main.py Data/GM12878_1mb_chr19_list.txt
This script takes in two Hi-C maps in coordinate list format. The script generates embeddings for the first input map and then trains a model using the map and the corresponding embeddings. The script then generates embeddings for the second input map and aligns these embeddings to those of the first input map and tests the model generated from the first input using these aligned embeddings. The output is a structure corresponding to the second input generalized from the model trained on the first input. The script searches for files corresponding to the raw matrix format, the normalized matrix format, the embeddings, and a trained model for the inputs in the current working directory. For example, if the input file is input.txt
, then the script checks if Data/input_matrix.txt
, Data/input_matrix_KR_normed.txt
, and Data/input_embeddings.txt
exists. If these files do not exist, then the script generates them automatically.
Inputs:
Outputs:
Outputs/input_2_generalized_structure.pdb
.Outputs/input_2_generalized_log.txt
.Outputs/input_1_weights.pt
.Data/input_matrix_KR_normed.txt
if these files don't exist already.Data/input_embeddings.txt
if these files don't exist already. Data/input_matrix.txt
if these files don't exist already.Usage: python HiC-GNN_generalize.py input_filepath1 input_filepath2
positional arguments:
input_filepath1
: Path of the input file with which a model will be trained and later generalized on input_filepath2
.
input_filepath2
: Path of the input file with which a generalized structure corresponding to a model trained on input_filepath1
will be generated.
optional arguments:
Same as HiC-GNN_main.py
Example: python HiC-GNN_generalize.py Data/GM12878_1mb_chr19_list.txt Data/GM12878_500kb_chr19_list.txt
This script takes a single Hi-C contact map as an input and utilizes it to generate node embeddings.
Inputs:
Outputs:
Data/input_embeddings.txt
.Data/input_filename_matrix.txt
.Usage: python HiC-GNN_generalize.py input_filepath
positional arguments:
input_filepath1
: Path of the input file with which a embeddings will be generated.
optional arguments:
-h, --help show this help message and exit
-bs, --batchsize
Batch size for embeddings generation. Default value: 128.
-ep, --epochs
Number of epochs used for embeddings generation. Default value: 10.
Example: python HiC-GNN_embed.py Data/GM12878_1mb_chr19_list.txt