semi-reference-based short read compression
The read files are in gzipped format i.e. they should be like .. 1.fastq.gz and 2.fastq.gz
The software is tested on paired end and single end data on bash compatible shell (redirection might not work with fish kind of ad on), single end support will be added to the "quark.sh" script soon.
Quark depends on plzip
for downstream compression. More information about Plzip and installation guide can be found here.
$git clone www.github.com/COMBINE-lab/quark.git
$cd quark
$mkdir build
$cd build
$cmake ..
$make
$cd ..
To see the options
$./quark.sh -h
snakemake -s quark.snake make_index --config out="<output dir>" fasta="<fasta file>" kmer=<#k>
snakemake -s quark.snake encode --config out="<output dir>" index="<index dir>" r="<mate>" p=<#threads> lib="single" quality=0
snakemake -s quark.snake encode --config out="<output dir>" index="<index dir>" m1="<mate1>" m2="<mate2>" p=<#threads> lib="paired" quality=0
snakemake -s quark.snake decode --config in="<in dir>" out="<out dir>" lib="paired/single" quality=0
$./check_pair.sh <original left end> <original right end> <quark left end> <quark right end>
Quark enables semi-reference-based compression of RNA-seq data by Hirak Sarkar, Rob Patro