brentp / methylcode

Alignment and Tabulation of BiSulfite Treated Reads
Other
16 stars 7 forks source link

MethylCoder

.. note::

MethylCoder has been superseded by bwa-meth: https://github.com/brentp/bwa-meth/
That repo also contains an improved runner script for gsnap in compare/src/gsnap-meth.py

MethylCoder is a single program that takes of bisulfite-treated reads and outputs per-base methylation data. It also includes scripts for analysis and visualization. In addition to a binary output and a SAM alignment file, the direct output of methylcoder is a text file that looks like ::

#seqid  mt  bp  c   t
1   3   1354    0   1
1   3   1358    0   1
1   3   1393    0   1
1   3   1394    0   1
1   3   1402    0   1
1   3   1409    0   1

where columns are reference seqid methylation context (type) basepair location(bp) number of reads where a (c)ytosine was unconverted, number of reads where where a cytosine was converted to (t)hymine. Making methylation at every methylable basepair easily calculated as c / (c + t).

.. contents ::

About

This software is developed in the Fischer Lab_ . At UC Berkeley. Please report any requests, bugs, patches, problems, docs to bpederse@gmail.com

It is distributed under the New BSD License <http://github.com/brentp/methylcode/blob/master/LICENSE>_

Cite

::

MethylCoder: Software Pipeline for Bisulfite-Treated Sequences 
Brent Pedersen; Tzung-Fu Hsieh; Christian Ibarra; Robert L. Fischer
Bioinformatics 2011; doi: 10.1093/bioinformatics/btr394

Requirements

Python

Python 2.6 must be installed along with the following modules. all of these are available from pypi and as such are installable via ::

$ easy_install [module]

matplotlib is required to plot the per-chromosome methylation levels.

C

Installation

once the above python and c libraries are installed, download methylcoder from:

http://github.com/brentp/methylcode/tarball/master (tar ball)

and untar; or clone the repository via git::

git clone git://github.com/brentp/methylcode.git

Then, from the methylcode directory, it is still necessary to run ::

$ sudo python setup.py install

to install the package into your path. After that, the executable 'methylcoder' will be available on your path. Running with no arguments will print help.

Input

The input to the pipeline is:

Output

Pipeline

You must have:

1) input reference fasta file to which to align the reads. here: `thaliana_v9.fasta`
2) a reads file in fastq or fasta format. here: `reads.fastq`.
   if you have paired end reads, they must be specified in order 1, 2.
3) a directory containing the bowtie and bowtie-build executables.
   (or the path to the gmap/gsnap install directory the gsnap utilities

An example command to run the pipeline is::

$ methylcoder --bowtie /usr/local/src/bowtie/ \
              --extra-args "-m 1"
              --reference /path/to/thaliana_v9.fasta \
              /path/to/reads.fastq

or using the gsnap aligner on paired-end reads.::

$ methylcoder --gsnap /usr/local/bin/ \
              --reference /path/to/thaliana_v9.fasta \
              /path/to/reads_1.fastq /path/to/reads_2.fastq

Where you must adjust /path/to/reads.fastq to point to your BS-treated reads. This will create the files specified in Output_ above, sending the text to path/to/reads_methylcoder/methy-data-DATE.txt where DATE is the current date. The binary files will be sent to, that same directory as: thaliana_v9.fasta.[CHR].methyl.bin where [CHR] is substituted by each chromosome in the fasta file. Once bowtie is run once, its output is not deleted, and methylcoder.py will only re-run bowtie if its input has been modified since it was run last. NOTE if the methylcoder executable is called without any options, it will print help and available command-line arguments.

Additional args can be sent directly to the aligner as a string to methylcoder.py's --extra-args parameter. This would look like. ::

--extra-args "--solexa-quals -k 1 -m 1 --strata"

and that string will be passed directly to the bowtie invocation when it is called from methylcoder. Whenever 2 fastq files are sent, they are assumed to be paired-end reads.

For stranded reads, send "--mode=cmet-stranded" to gsnap via --extra-args.

Limitations

Colorspace

Don't do BS-Seq in colorspace.

Analysis/Visualization

See: http://github.com/brentp/methylcode/wikis/using-samtools-to-view-alignments

Reading

Notes

warning Methylcoder assumes that the Bisulfite converted reads are created using the Zilberman/Ecker method in which BS conversion occurs after conversion to solexa library--giving only 2 possibibilities. This is in contrast to the Jacobsen method which gives 4 possiblities. If you have a library generated using the Jacobsen method, you can use scripts/tagged_reads_prep.py to convert the reads to a format that MethylCoder can map.

.. cython: http://cython.org .. numpy: http://numpy.scipy.org .. pyfasta: http://pypi.python.org/pypi/pyfasta/ .. py-tcdb: http://pypi.python.org/pypi/py-tcdb .. h5py: http://pypi.python.org/pypi/h5py/ .. bowtie: http://bowtie-bio.sourceforge.net/index.shtml .. tokyocabinet: http://fallabs.com/tokyocabinet/ .. sam-tools: http://samtools.sourceforge.net/ .. Fischer Lab: http://epmb.berkeley.edu/facPage/dispFP.php?I=8 .. gsnap: http://research-pub.gene.com/gmap/ .. _bsolana: http://code.google.com/p/bsolana/