enasequence / cramtools

CRAM format specification and java API for read data.
http://www.ebi.ac.uk/ena/about/cram_toolkit
Apache License 2.0
58 stars 21 forks source link

CRAMTools is no longer supported. Please use Samtools: http://www.htslib.org/.

CRAMTools is a set of Java tools and APIs for efficient compression of sequence read data. Although this is intended as a stable version the code is released as early access. Parts of the CRAMTools are experimental and may not be supported in the future.

http://www.ebi.ac.uk/ena/about/cram_toolkit Version 3.0

Change log:

Input files:

  1. Reference sequence in fasta format <fasta file>
  2. Reference sequence index file <fasta file>.fai created using samtools (samtools faidx <fasta file>)
  3. Input BAM file <BAM file> sorted by reference coordinates
  4. BAM index file <BAM file>.bai created using samtools (samtools index <BAM file>)
  5. Download and run the program: download the prebuilt runnable jar file from https://github.com/enasequence/cramtools/blob/master/cramtools-3.0.jar?raw=true
  6. Execute the command line program: java -jar cramtools-3.0.jar. Usage is printed if no arguments were given

To convert a BAM file to CRAM:

java -jar cramtools-3.0.jar cram \
    --input-bam-file <bam file> \
    --reference-fasta-file <reference fasta file> \
    [--output-cram-file <output cram file>]

To convert a CRAM file to BAM:

java -jar cramtools-3.0.jar bam \
    --input-cram-file <input cram file> \
    --reference-fasta-file <reference fasta file> \
    --output-bam-file <output bam file>

To build the program from source:

  1. To check out the source code from github you will need git client: http://git-scm.com/
  2. Make sure you have java 1.7 or higher: http://openjdk.java.net/ or http://www.oracle.com/us/technologies/java/index.html
  3. Make sure you have ant version 1.7 or higher: http://ant.apache.org/
  4. Run the following commands
# Clone the repository to your local directory:
git clone https://github.com/enasequence/cramtools.git

# Change to the directory: 
cd cramtools

# Build a runnable jar file: 
ant -f build/build.xml runnable

# Run cramtools
java -jar cramtools-3.0.jar 

Picard integraion

Picard tools have been removed from cramtools because Picard supports CRAM via htsjdk.

Reference sequence discovery

cramtools supports the following reference discovery mechanism:

  1. check local file provided in the command line (-R or --reference-fasta-file option)
  2. download sequences from the ENA reference registry using MD5 checksums from the SAM header
  3. access local cache using REF_CACHE/REF_PATH environment variables. Please refer to http://www.htslib.org/doc/samtools.html for more details.

The following tools have been included into this release

The usage can be accessed by calling cramtools with the corresponding command as a single argument.

Lossy model

Bam2Cram allows to specify lossy model via a string which can be composed of one or more words separated by '-'.

Each word is read or base selector and quality score treatment, which can be binning (Illumina 8 bins) or full scale (40 values).

Here are some examples:

Selectors:

By default no quality scores will be preserved.

Illimuna 8-binning scheme:

0, 1, 6, 6, 6, 6, 6, 6, 6, 6, 15, 15, 15, 15, 15, 15, 15, 15, 15,
15, 22, 22, 22, 22, 22, 27, 27, 27, 27, 27, 33, 33, 33, 33, 33, 37,
37, 37, 37, 37, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40,
40, 40, 40, 40, 40, 40 

Check for more on our web site: http://www.ebi.ac.uk/ena/about/cram_toolkit