Hughes-Genome-Group / deepHaem

Implementation of a deep convolutional neural network for predicting chromatin features from DNA sequence
GNU General Public License v3.0
17 stars 6 forks source link

deepHaem

Implementation of a deep convolutional neuronal network for predicting chromatin features from DNA sequence.

The repository contains a flexible tensorflow implementation of a convolutional neuronal network with max pooling. The basic architecture is built on the principles described in DeepSEA (http://deepsea.princeton.edu/help/) and Basset (https://github.com/davek44/Basset). It is comprised of multiple layers of convolutional filters followed by ReLU and max pooling and a fully connected layer in the end. An additional pre-final fully connected layer can be switch on as well. The number of convolutional layers is flexible and selected as hyperparameter in the beginning of the training procedure. Batch normalization is optional.

Contents

Requirements

Models

[./models] contains links to trained models.

Data

To train new models you will need chromatin feature data as peak calls. In addition to cell types and assays of bespoke interest we highly recommend training models with a large data compendium. A good reference are the compendia used in DeepSEA or Basset. Users should compile a training dataset from a large compendium and their bespoke data and train a model for the whole set.

Workflow

Example workflows for formatting data, training a model and making predictions are outline under ./tutorials.