To access Juicer 1.6 (last stable release), please see the Github Release. If you clone the Juicer repo directly from Github, it will clone Juicer 2, which is under active development. If you encounter any bugs, please let us know.
ENCODE's Hi-C uniform processing pipeline based on Juicer can be found here.
Juicer is a platform for analyzing kilobase resolution Hi-C data. In this distribution, we include the pipeline for generating Hi-C maps from fastq raw data files and command line tools for feature annotation on the Hi-C maps.
The beta release for Juicer version 1.6 can be accessed via the Github Release. The main repository on Github is now focused on the Juicer 2.0 release and is under active development. For general questions, please use the Google Group.
If you are interested in running Juicer in the cloud, you may want to check out the dockerized version of Juicer hosted by ENCODE.
If you have any difficulties using Juicer, please do not hesitate to contact us (aidenlab@bcm.edu)
If you use Juicer in your research, please cite: Neva C. Durand, Muhammad S. Shamim, Ido Machol, Suhas S. P. Rao, Miriam H. Huntley, Eric S. Lander, and Erez Lieberman Aiden. "Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments." Cell Systems 3(1), 2016.
Please see the wiki for extensive documentation.
For FAQs, or for asking new questions, please see our forum: aidenlab.org/forum.html.
In this repository, we include the scripts for running Juicer on AWS, LSF, Univa Grid Engine, SLURM, and a single CPU.
The SLURM and CPU scripts are the most up to date. For cloud computing, we recommend the ENCODE uniform processing pipeline based on Juicer
/SLURM - scripts for running pipeline and postprocessing on SLURM
/CPU - scripts for running pipeline and postprocessing on a single CPU
/AWS - scripts for running pipeline and postprocessing on AWS Deprecated
/UGER - scripts for running pipeline and postprocessing on UGER (Univa) Deprecated
/LSF - scripts for running pipeline and postprocessing on LSF Deprecated
/misc - miscellaneous helpful scripts
Juicer is a pipeline optimized for parallel computation on a cluster. Juicer consists of two parts: the pipeline that creates Hi-C files from raw data, and the post-processing command line tools.
Juicer requires the use of a cluster or the cloud, with ideally >= 4 cores (min 1 core) and >= 64 GB RAM (min 16 GB RAM)
Juicer currently works with the following resource management software:
We recommend ENCODE's Hi-C processing pipeline, based on Juicer to run in the cloud; the AWS scripts are out of date.
The minimum software requirement to run Juicer is a working Java installation (version >= 1.8) on Windows, Linux, and Mac OSX. We recommend using the latest Java version available, but please do not use the Java Beta Version. Minimum system requirements for running Java can be found at https://java.com/en/download/help/sysreq.xml
To download and install the latest Java Runtime Environment (JRE), please go to https://www.java.com/download
The latest version of GNU coreutils can be downloaded from https://www.gnu.org/software/coreutils/manual/
The latest version of BWA should be installed from http://bio-bwa.sourceforge.net/
You must have an NVIDIA GPU to install CUDA.
Instructions for installing the latest version of CUDA can be found on the NVIDIA Developer site.
The native libraries included with Juicer are compiled for CUDA 7 or CUDA 7.5. See the download page for Juicer Tools.
Other versions of CUDA can be used, but you will need to download the respective native libraries from JCuda.
For best performance, use a dedicated GPU. You may also be able to obtain access to GPU clusters through Amazon Web Services, Google cloud, or a local research institution.
If you cannot access a GPU, you can run the CPU version of HiCCUPS directly using the .hic
file and Juicer Tools.
See the Juicebox documentation at https://github.com/theaidenlab/Juicebox for details on building new jars of the juicer_tools.
Run the Juicer pipeline on your cluster of choice with "juicer.sh [options]"
Usage: juicer.sh [-g genomeID] [-d topDir] [-q queue] [-l long queue] [-s site]
[-a about] [-R end] [-S stage] [-p chrom.sizes path]
[-y restriction site file] [-z reference genome file]
[-C chunk size] [-D Juicer scripts directory]
[-Q queue time limit] [-L long queue time limit] [-e] [-h] [-x]
* [genomeID] must be defined in the script, e.g. "hg19" or "mm10" (default
"hg19"); alternatively, it can be defined using the -z command
* [topDir] is the top level directory (default
"/Users/nchernia/Downloads/neva-muck/UGER")
[topDir]/fastq must contain the fastq files
[topDir]/splits will be created to contain the temporary split files
[topDir]/aligned will be created for the final alignment
* [queue] is the queue for running alignments (default "short")
* [long queue] is the queue for running longer jobs such as the hic file
creation (default "long")
* [site] must be defined in the script, e.g. "HindIII" or "MboI"
(default "none")
* [about]: enter description of experiment, enclosed in single quotes
* [stage]: must be one of "chimeric", "merge", "dedup", "final", "postproc", or "early".
-Use "chimeric" when alignments are done but chimeric handling has not finished
-Use "merge" when alignment has finished but the merged_sort file has not
yet been created.
-Use "dedup" when the files have been merged into merged_sort but
merged_nodups has not yet been created.
-Use "final" when the reads have been deduped into merged_nodups but the
final stats and hic files have not yet been created.
-Use "postproc" when the hic files have been created and only
postprocessing feature annotation remains to be completed.
-Use "early" for an early exit, before the final creation of the stats and
hic files
* [chrom.sizes path]: enter path for chrom.sizes file
* [restriction site file]: enter path for restriction site file (locations of
restriction sites in genome; can be generated with the script
(misc/generate_site_positions.py) )
* [reference genome file]: enter path for reference sequence file, BWA index
files must be in same directory
* [chunk size]: number of lines in split files, must be multiple of 4
(default 90000000, which equals 22.5 million reads)
* [Juicer scripts directory]: set the Juicer directory,
which should have scripts/ references/ and restriction_sites/ underneath it
(default /broad/aidenlab)
* [queue time limit]: time limit for queue, i.e. -W 12:00 is 12 hours
(default 1200)
* [long queue time limit]: time limit for long queue, i.e. -W 168:00 is one week
(default 3600)
* -f: include fragment-delimited maps from hic file creation
* -e: early exit
* -h: print this help and exit
juicer.sh -s none
juicer.sh [options] -S stage
where "stage" is one of merge, dedup, final, postproc, or early. "merge" is for when alignment has finished but merged_sort hasn't been created; "dedup" is for when merged_sort is there but not merged_nodups (this will relaunch all dedup jobs); "final" is for when merged_nodups is there and you want the stats and hic files; "postproc" is for when you have the hic files and just want feature annotations; and "early" is for early exit, before hic file creation. If your jobs failed at the alignment stage, run relaunch_prep.sh
and then run juicer.sh.Detailed documentation about the command line tools can be found on the wiki:
To launch the command line tools, use the shell script “juicer_tools” on Unix/MacOS or type
java -jar juicer_tools.jar (command...) [flags...] <parameters...>`
In the command line tools, there are several analysis functions:
apa
for conducting aggregate peak analysishiccups
for annotating loopsmotifs
for finding CTCF motifsarrowhead
for annotating contact domainseigenvector
for calculating the eigenvector (first PC) of the Pearson'spearsons
for calculating the Pearson'sThe juicer_tools
(Unix/MacOS) script can be used in place of the unwieldy
java -Djava.library.path=path/to/natives/ -jar juicer_tools.jar