Epiphany, a neural network to predict cell-type-specific Hi-C contact maps from widely available epigenomic tracks. Epiphany uses bidirectional long short-term memory layers to capture long-range dependencies and optionally a generative adversarial network architecture to encourage contact map realism and sharpness. Epiphany shows excellent generalization to held-out chromosomes within and across cell types, yields accurate TAD and interaction calls, and predicts structural changes caused by perturbations of epigenomic signals.
Epiphany is creating a connection between 1D epigenomic signals and the 3D chromatin structure, enabling the interpretation of feature importance of epigenomic signals from specific tracks in relation to structural changes. Any combination of epigenomic tracks can be used as input. Through our ablation analysis, we found that using a two-track combination (ATAC + CTCF) along yields commendable prediction quality. Furthermore, incorporating ATAC or CTCF in conjunction with other relevant epigenomic tracks as the input set significantly enhances the predictive capabilities.
This repo includes scripts and related files for the Epiphany model [preprint].
GM12878_X.h5
and GM12878_y.pickle
for input and target sample datasets for Epiphany training
pretrained_10kb.pt_model
: pretrained weights of 10kb modelpretrained_5kb.pt_model
: pretrained weights of 5kb model
git clone https://github.com/arnavmdas/epiphany.git
Move to training directory
cd epiphany/epiphany
Download dataset from google drive
mkdir ./Epiphany_dataset
cd ./Epiphany_dataset
wget --no-check-certificate https://drive.google.com/drive/u/2/folders/1UJX6cp-4s0Jbud9jovzuaqnBeORg5R8x -O GM12878_X.h5
wget --no-check-certificate https://drive.google.com/drive/u/2/folders/1UJX6cp-4s0Jbud9jovzuaqnBeORg5R8x -O GM12878_y.pickle
cd ..
Run training script
python3 adversarial.py --wandb
If you have any questions, please feel free to contact Rui Yang (ruy4001@med.cornell.edu), Arnav Das (arnavmd2@uw.edu).