dariober / SICERpy

Python wrapper around the popular ChIP-Seq peak caller SICER
15 stars 3 forks source link

NB This project is most likely superseded by epic; use that instead

SICERpy: A friendly version of SICER peak caller

SICER is a popular peak caller particularly suitable for detection of broad ChIP-Seq marks. However, I found the original master script, SICER.sh, clunky to use.

SICER.py is a re-write of SICER.sh aimed at improving the following aspects:

Requirements and Installation

SICER.py requires python 2.6+ with scipy package as per the original version. In addition it requires the pysam package.

To install, first download the SICERpy directory and change permission of SICER.py to be executable: chmod 755 SICER.py. Then use sicer in one of the following ways:

ln -s /path/to/SICERpy/SICER.py ~/bin/

Usage examples

These input files can be found in the ex/ directory.

SICER.py -t ex/test.bam -c ex/control.bam -rt 0 > peaks.bed 2> sicer.log

Here, the table of candidate islands is sent to peaks.bed. The log is captured by stderr. Note that by setting the redundancy threshold to 0 we use directly the input files without further filtering. This is useful if the bam file have been already de-duplicated with external tools like picard/MarkDuplicates.

Apply some filtering to discard reads with low mapping quality and with certain bits set (see explain samflag):

SICER.py -t ex/test.bam -c ex/control.bam -F 3972 -q 5 > peaks.bed

Important if working with paired-end reads discard the second-in-pair to avoid double counting!

The output is in bed format with columns:

The output file peaks.bed can be easily parsed to get the enriched regions. For example, the get regions with FDR < 0.01:

awk '$8 < 0.01' peaks.bed > peaks.01.bed

Help on options

The one shown here might be out of date

SICER.py -h
iusage: SICER.py [-h] [--treatment TREATMENT] --control CONTROL
                [--effGenomeSize EFFGENOMESIZE] [--requiredFlag REQUIREDFLAG]
                [--filterFlag FILTERFLAG] [--mapq MAPQ]
                [--redThresh REDTHRESH] [--windowSize WINDOWSIZE]
                [--gapSize GAPSIZE] [--fragSize FRAGSIZE] [--keeptmp]
                [--version]

DESCRIPTION

Run the SICER pipeline using a control and an input file.

SEE ALSO:

https://github.com/dariober/SICERpy

optional arguments:
  -h, --help            show this help message and exit
  --treatment TREATMENT, -t TREATMENT
                        Treatment (pull-down) file in bam format

  --control CONTROL, -c CONTROL
                        Control (input) file in bam format

  --effGenomeSize EFFGENOMESIZE, -gs EFFGENOMESIZE
                        Effective Genome as fraction of the genome size. It
                        depends on read length. Default 0.74.

  --requiredFlag REQUIREDFLAG, -f REQUIREDFLAG
                        Keep reads with these bits set in flag. Same as
                        `samtools view -f`. Default 0

  --filterFlag FILTERFLAG, -F FILTERFLAG
                        Discard reads with these bits set in flag. Same as
                        `samtools view -F`. Default 4.  You probably want to
                        discard also second-in-pair reads, secondary and
                        supplementary alignments, reads failing QC.

  --mapq MAPQ, -q MAPQ  Discard reads with mapping quality lower than this. Default 5.

  --redThresh REDTHRESH, -rt REDTHRESH
                        Redundancy threshold to keep reads mapping to the same
                        position on the same strand. Default 0 (do not filter
                        for redundancy). 

  --windowSize WINDOWSIZE, -w WINDOWSIZE
                        Size of the windows to scan the genome. WINDOW_SIZE is
                        the smallest possible island. Default 200.

  --gapSize GAPSIZE, -g GAPSIZE
                        Multiple of window size used to determine the gap size.
                        Must be an integer. Default: 3.

  --fragSize FRAGSIZE, -fs FRAGSIZE
                        Size of the sequenced fragment. The center of the the
                        fragment will be taken as half the fragment size.
                        Default 150.

  --keeptmp             For debugging: Do not delete temp directory at the end of run.

  --version             show program's version number and exit

See also

Original papers describing SICER:

TODO