WeirauchLab / RELI

Regulatory Element Locus Intersection (RELI) Analysis
GNU General Public License v3.0
12 stars 6 forks source link
analysis bioinformatics c-plus-plus c-plus-plus-11 chip-seq disease genetic-variants gsl

Regulatory Element Locus Intersection (RELI) Analysis

RELI (Regulatory Element Locus Intersection) is an algorithm for discovering transcription factors (TFs) that bind a significant number of loci associated with a given disease or phenotype (e.g., through a Genome Wide Association study, or GWAS).

The major data components are

  1. An input set of disease or phenotype-associated genetic variants (RS IDs)

  2. An internal “library” consisting of many ChIP-seq dataset peaks (in the form of .bed files)

  3. An internal file containing information on genetic variant allele frequencies, etc.

To assess the significance of the intersection between the input disease variants and a given TF ChIP-seq dataset, RELI performs simulations, generating a null distribution used for P-value calculations.

The output of RELI is a series of statistics based upon the significance of the overlap between the input genetic variants and the selected ChIP-seq dataset.

Additional details on RELI and the associated findings can be found in its accompanying publication.

Quick Start

If you have the Common Workflow Language's cwltool already installed and Docker available on your system, this is the most straightforward way to run RELI on some sample data:

# clone the public repository and check out the 'cwl-docker-workflow' branch
git clone https://github.com/WeirauchLab/RELI.git
cd RELI
git checkout cwl-docker-workflow

# retrieve sample data and use CWL to run with example input parameters
make fetchdata
cwltool workflow/reli-docker.cwl workflow/reli-example-eu-ancestry.yaml

This will run RELI on a small input set of ChIP-seq data of European ancestry using a set of lupus (SLE)-associated SNPs.

Installation

RELI requires a C++11 compiler (e.g. GNU CC 4.7 or higher) and libgsl and libgslcblas from the GNU Scientific Library.

You may download the latest release as a compressed archive from GitHub, or clone the repository with Git:

# GitHub
git clone https://github.com/WeirauchLab/RELI.git

# Weirauch Lab GitLab
git clone https://tfwebdev.research.cchmc.org/gitlab/ches2d/RELI_public.git

Installation with Docker

This is the recommended method if you have some familiarity with Docker, as it does not require you to download a compiler or any of the dependencies necessary to build RELI from source.

You will need to install the appropriate Docker client for your OS. Please see the official docs for help with that.

The most straightforward way to get started is to simply use the pre-built image available on Docker Hub:

docker run -it --rm weirauchlab/reli RELI --help

Or, if you've cloned the RELI_public repository (see above), you can locally build the CentOS 7-based Docker container and compile RELI from source as follows:

cd /path/to/cloned/repo
docker build -t reli .

# test to see if it works
docker run -it --rm reli RELI --help

Installation on Linux

A GNU-style Makefile is provided in the repository. With GSL installed system-wide, you can build the RELI binary with just

make

then run ./RELI with no arguments to verify that you have a working binary (you should get a help screen).

In order to run a test analysis, you need to download the sample data either manually (see the next section) or just type

make test

which will download and validate the sample datasets automatically, then invoke example/example_run.sh to invoke RELI on the sample data.

This test analysis requires around 10 GB of RAM to finish successfully; 16 GB is recommended.

Toolchain or libraries in non-standard locations

The included Makefile will respect CFLAGS and LDFLAGS if set in the environment, for example, if you have a locally-built GSL that is installed in a non-standard place (such as in your home directory):

CFLAGS=-I/path/to/include LDFLAGS=-L/path/to/lib make

If g++ is not available in your PATH (or it has a different name), you will likely want to modify the Makefile directly, beginning around line 33 with the CC variable.

Other platforms

RELI has also been verified to build and run on the following platforms (in addition to GNU/Linux):

On Windows, make sure you run make (or the example/example_run.sh script) from within the Cygwin shell, not the Windows Command Prompt or PowerShell. You may need to lightly modify the CDT build toolchain settings if your installation of Cygwin is not at C:\Cygwin64.

Eclipse CDT project settings files are also included for both of the above toolchains. Just create a copy (or symlink) of the appropriate one called .cproject, then choose FileImport...Existing Projects into Workspace and browse to where you cloned the repository.

Manually downloading sample data

If you have problems with make test (perhaps you don't have curl available), you can manually download and extract the sample datasets from

https://tf.cchmc.org/external/RELI/RELI_public_data.tar.bz2

such that the decompressed data is inside a data subdirectory, within the RELI_public repository you cloned above. A .zip-format archive is also provided, in case for some reason you don't have bzip2 available.

You can run the sample analysis by changing into the example directory and running example_run.sh in a terminal like so:

user@[/path/to/repo]$ cd example
user@[/path/to/repo]$ ./example_run.sh

Command-line options

Required options are in bold text

Option Explanation
-snp FILE Phenotype snp file in 4 column bed format
-ld FILE (optional) Phenotype linkage disequilibrium structure for snps, default: no ld file
-index FILE ChIP-seq index file
-data DIR Specify directory where ChIP-seq data are stored
-target STRING Target label of ChIP-seq experiment to be tested from index file
-build FILE Genome build file
-null FILE Null model file
-dbsnp FILE dbSNP table file
-out DIR Specify output directory name under currentg working folder.
-match (optional) Boolean switch to turn on minor allele frequency based matching, default: off
-rep NUMBER (optional) Number of permutation/simulation to be performed, default: 2000
-corr NUMBER (optional) Bonferroni correction multiplier for multiple test, default: 1
-phenotype STRING (optional) User-provided phenotype name, default: "."
-ancestry STRING (optional) User provided ancestry name, default: "."

Adding custom data sets

Adding ChIP-seq data

To add an additional ChIP-seq dataset, create an entry in the ChIP-seq index file (data/ChIPseq.index) with the following tab-delimited format:

label ⇥  source ⇥  Cell ⇥  TF ⇥  Cell label ⇥  PMID ⇥  Group ⇥  EBV Status ⇥  Species

where label corresponds to the filename, which you should deposit in the data/ChIP-seq directory (in BED 4 column format).

Adding a new genome build or species

To use a different genome build, use the UCSC fetchChromSizes utility (usage information here) to download chromosome information for that build. You may wish to prune lines representing unmapped chromosome information (e.g., chrN_glXXXXXX_random and chrUn_glXXXXXX) from the downloaded data file.

Be advised, however, that the null model included with the data was generated for Homo sapiens at build hg19; using a later "hg" build may invalidate this model.

If you need support for a different organism, please contact us via email for additional details (see "Feedback" section, below), or file an issue against the public GitHub repository.

How to cite

Transcription factors operate across disease loci, with EBNA2 implicated in autoimmunity.

Harley JB, Chen X, Pujato M, Miller D, Maddox A, Forney C, Magnusen AF, Lynch A, Chetal K, Yukawa M, Barski A, Salomonis N, Kaufman KM, Kottyan LC, Weirauch MT.

Nat Genet. 2018 Apr 16. doi: 10.1038/s41588-018-0102-3. Epub 2018 Apr

PMID: 29662164

License

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See LICENSE.txt for more details.

Feedback

Please report any issues with RELI (or feature suggestions) in our GitHub issue tracker.

With other questions, you may contact Dr. Chen (the primary author of RELI) or Dr. Weirauch via email.

Contributors

Name Institution Remarks
Dr. Xiaoting Chen Cincinnati Children's Hospital primary author

Credits

Project avatar based on Wikimedia Commons Chromosome_18.svg