Genetalks / gtz

A high performance and compression ratio compressor for genomic data, powered by GTXLab of Genetalks.
Other
171 stars 39 forks source link

weird gtz behavior #2

Closed jchenpku closed 6 years ago

jchenpku commented 7 years ago

I'm trying to compress a fastq file by gtz. However I obtained 3 totally different outcome by just modify the output

The first one was a success! cat DMS_273.2_1.fastq | gtz -o ./G20481.DMS_273.2_1.fastq.gtz Powered by GTXLab of Genetalks. Compressor initializing ... compressing ... id: 375442529 / 3183320019 base: 819648004 / 8518710584 quality: 2158646143 / 8518710584 () source/compressed : 20468858971/3353746175. ratio : 16.385% The cost time of compressing () is 00:06:11 (hh::mm:ss)

Compress finished, the total cost time is 00:06:11 (hh:mm:ss)

real 6m14.742s user 134m18.192s sys 1m8.160s #################### This one caused an immediate error cat DMS_273.2_1.fastq | gtz -o G20481.DMS_273.2_1.fastq.gtz Powered by GTXLab of Genetalks. Compressor initializing ... gtz: line 8: 47524 Segmentation fault (core dumped) $basepath/_gtz $@

real 0m0.198s user 0m0.004s sys 0m0.000s ##################### This one was aborted cat DMS_273.2_1.fastq | gtz -c > G20481.DMS_273.2_1.fastq.gtz Powered by GTXLab of Genetalks. Compressor initializing ... compressing ... id: 375442529 / 3183320019 base: 819648212 / 8518710584 quality: 2158646143 / 8518710584 () source/compressed : 20468858971/3353746396. ratio : 16.385% The cost time of compressing () is 00:07:01 (hh::mm:ss)

Compress finished, the total cost time is 00:07:02 (hh:mm:ss) terminate called without an active exception gtz: line 8: 23992 Aborted (core dumped) $basepath/_gtz $@

real 7m6.006s user 136m5.360s sys 1m6.200s

############# This is the status by pigz: cat DMS_273.2_1.fastq | pigz -p 4 -c > G20481.DMS_273.2_1.fastq.gz

real 6m4.838s user 23m46.496s sys 0m29.420s

############# This is the final file: 3.2G Apr 4 19:55 G20481.DMS_273.2_1.fastq.gtz 5.1G Apr 4 19:36 G20481.DMS_273.2_1.fastq.gz

The compression ratio is very good.

superligen commented 7 years ago

cat DMS_273.2_1.fastq | gtz -c > G20481.DMS_273.2_1.fastq.gtz "-c" can be only used with "-d". In gtz design, '-c' is used to write decompressed data to stdout stream. We will improve the command line parameter check function in the next release.

superligen commented 7 years ago

cat DMS_273.2_1.fastq | gtz -o G20481.DMS_273.2_1.fastq.gtz Powered by GTXLab of Genetalks. Compressor initializing ... gtz: line 8: 47524 Segmentation fault (core dumped) $basepath/_gtz $@

We have found the similar problem in gtz when deal with some quality numbers. Please try to use 0.2.2b version: https://github.com/Genetalks/gtz/archive/0.2.2b_tech_preview.tar.gz

Feel free to tell us whether the problem has been solved.