======= Hydra-Multi is a paired-end read structural variant discovery tool that is capable of integrating signals from hundreds of samples.
Below are the requirements and instructions for installation of Hydra-Multi.
The ulimit determines the number of open file handles on a system.
This number must be larger than 4*number of possible chromosome-chromosome combinations in the respective reference.
For the human reference (hg19 b37), 16384 is the recommended ulimit.
git clone https://github.com/arq5x/Hydra
cd Hydra
make
chmod +x scripts/*
sudo cp scripts/* /usr/local/bin
sudo cp bin/* /usr/local/bin
chmod +x hydra-multi.sh
./hydra-multi.sh test
A wrapper script (hydra-multi.sh) can be used to automatiically run Hydra-Multi or each step may be performed manually. Both the automatic and manual executions begin by creating a stub file.
========================== Start with a simple config file "stub" such as the one below:
$ cat config.stub.txt
sample1 /full/path/to/file/sample1.pos.bam
sample2 /full/path/to/file/sample2.pos.bam
sample3 /full/path/to/file/sample3.pos.bam
hydra-multi.sh can then then be used to execute subsequent steps:
./hydra-multi.sh run config.stub.txt
To obtain a parameter list for using the run script:
$./hydra-multi.sh run -h
usage: hydra-multi.sh run [options] <stub_file>
positional arguments:
stub file
the stub file to create the configuration file, example on https://github.com/arq5x/Hydra
options:
-t INT Number of threads to use. [Default: 2]
-p INT The punt parameter for assembly, the maximum read depth allowed. [Default: 10]
-o STR The stub for the output file names
==========================
HydraMulti needs a configuration file documenting the sample/libraries and the paths to their respective BAM files that will be input to SV discovery process.
The make_hydra_config.py
script will inspect the alignments in each sample's
BAM file to automatically create a complete config file documenting the
statistics of the fragment library:
python scripts/make_hydra_config.py -i config.stub.txt
sample1 /full/path/to/file/sample1.pos.bam 374.23 12 3
sample2 /full/path/to/file/sample2.pos.bam 398.19 20 3
sample3 /full/path/to/file/sample3.pos.bam 401.78 23 3
Just redirect the output to a new, complete config file and you should be ready to go:
python scripts/make_hydra_config.py -i config.stub.txt > config.hydra.txt
=================================
Once you have created a configuration file for Hydra-Multi, you need to run the
extract_discordants.py
script to, you guessed it, extract the discordant
alignments from your BAM files into BEDPE format for HydaMulti.
For each inout BAM file listed in your configuration file,
extract_discordants.py
will create a BEDPE file of the discordant alignments
in the the same directory. For example, it will create a sample1.pos.bam.bedpe
file for the sample1.pos.bam
input file listed in the config file:
python scripts/extract_discordants.py -c config.hydra.txt -d <sample_name>
================================= This routes all of the alignments on with the same chromosome/orientation set to the same file for assembly.
$ hydra-router -config config.hydra.txt -routedList routed-files.txt
================================== Assembly of each chromosome/orientation set.
$ sh scripts/assemble-routed-files.sh routed-files-test.txt config.hydra.txt 1
=============================================================== Combine all of the chromosome/orientation sets back into one file.
$ sh scripts/combine-assembled-files.sh /full/path/to/assembled/files/ all.assembled
===============================================================
$ scripts/forceOneClusterPerPairMem.py -i all.assembled -o all.sv-calls
=======================================================================
$ scripts/frequency.py -f all.sv-calls.final -d all.sv-calls.detail > all.sv-calls.freq
===============================================================
$ scripts/hydraToBreakpoint -i all.sv-calls.freq > all.sv-calls.bkpts