dstreett / Super-Deduper

An application to remove PCR duplicates from high throughput sequencing runs.
11 stars 4 forks source link

gzip compression level? #16

Closed sklages closed 8 years ago

sklages commented 9 years ago

Hi,

I wondered why the deduped (gzip'ed) files are by far larger than the original gzip'ed fastqs. (e.g. 14G to 34G, my exremest example). When decompressing the files it turned out that the deduped files are indeed smaller than the input files. That is okay ..

Our fastqs are compressed with "max speed", so it seems super_deduper does not use any compression when writing gzip'ed output?

compressed (input/output):

  14G 2015.11.30 17:04:03 athCun_PE1000.raw.il.fq.gz
  27G 2015.12.01 17:30:33 athCun_PE1000.raw.il_nodup_PE1.fastq.gz

uncompressed (input/output):

  36G 2015.12.02 08:32:23 athCun_PE1000.raw.il.fq
  28G 2015.12.02 10:16:15 athCun_PE1000.raw.il_nodup_PE1.fastq

Did I get this right? Just curious ;-)

best, Sven