COMBINE-lab / quark

semi-reference-based short read compression
GNU General Public License v3.0
11 stars 5 forks source link

Quark

semi-reference-based short read compression

Assumption

The read files are in gzipped format i.e. they should be like .. 1.fastq.gz and 2.fastq.gz

The software is tested on paired end and single end data on bash compatible shell (redirection might not work with fish kind of ad on), single end support will be added to the "quark.sh" script soon.

Dependency

Quark depends on plzip for downstream compression. More information about Plzip and installation guide can be found here.

Compile

$git clone www.github.com/COMBINE-lab/quark.git
$cd quark
$mkdir build
$cd build
$cmake ..
$make
$cd ..

Running Quark

To see the options

$./quark.sh -h

To build the index with kmer size k

snakemake -s quark.snake make_index --config out="<output dir>" fasta="<fasta file>" kmer=<#k>

To Encode

Single End

snakemake -s quark.snake encode --config out="<output dir>" index="<index dir>" r="<mate>" p=<#threads> lib="single" quality=0

Paired end

snakemake -s quark.snake encode --config out="<output dir>" index="<index dir>" m1="<mate1>" m2="<mate2>" p=<#threads> lib="paired" quality=0

To Decode

snakemake -s quark.snake decode --config in="<in dir>" out="<out dir>" lib="paired/single" quality=0

To check the encoded and decoded sequences are same !! (it is lossless)

$./check_pair.sh <original left end> <original right end> <quark left end> <quark right end>

Link to the preprint

Quark enables semi-reference-based compression of RNA-seq data by Hirak Sarkar, Rob Patro