gifford-lab / Keras-genomics

Perform hyper-parameter tuning, training, testing and prediction with Keras
Other
44 stars 19 forks source link

A Keras-based deep learning platform to perform hyper-parameter tuning, training and prediction on genomics data.

Table of contents

Notice of major refactorization

The latest version has gone through major refactorization that changes the interface substantially, now using Hyperband to optimize the hyperparameter space. To use the old version, please download release 0.1 from here or checkout the README at here.

Data preparation

Note that the following procedure encodes each sequence into an array of shape (bs, 4, 1, len) where bs is the number of samples and len is the length of each DNA sequence. Therefore, to work with datasets generated from this procedure, you will need to set "image_data_format" in ~/.keras/keras.json file as "channels_first".

User needs to prepare sequence file in FASTA format and target file for training,validation and test set. Refer to the toy data we provided for more examples.

Then run the following to embed each set into HDF5 format.

paste - - -d' ' < FASTA_FILE > tmp.tsv
python $REPO_HOME/embedH5.py tmp.tsv TARGET_FILE DATA_TOPDIR/FILE_NAME  -b BATCHSIZE

Model preparation

Change the model function in the template provided to implement your favorite network. Refer to here for examples of how to specifying hyper-parameters to tune.

Running the model

python main.py -d DATA_TOPDIR -m MODEL_FILE_NAME ORDER

Quick run on the toy data

We prepare some toy data and toy model here.

To perform a quick run, first run the following command to convert the data to desired format and save under "expt1" in the current folder.

cd $REPO_HOME
for dtype in 'train' 'valid' 'test'
do
    paste - - -d' ' < example/$dtype.fa > tmp.tsv
    python embedH5.py tmp.tsv example/$dtype.target expt1/$dtype.h5
done

Then perform hyper-parameter tuning, training and testing by:

python main.py -d expt1 -m example/model.py -y -t -e

All the intermediate output will be under "expt1". If everything works fine, you should get a test AUC around 0.97