circulosmeos / gztool

extract random-positioned data from gzip files with no penalty, including gzip tailing like with 'tail -f' !
https://circulosmeos.wordpress.com/2019/08/11/continuous-tailing-of-a-gzip-file-efficiently/
133 stars 12 forks source link

Segmentation fault -z -b 0 #15

Closed jnorthrup closed 1 year ago

jnorthrup commented 1 year ago

Hey I'm reporting something where I want the block index but i am doing my own line index with -b 0 so -z looks like the thing.

then this happens.

$ gztool -z ~/work/fumes/data/galaxy_1day.json.gz -I FOOIdx.gzi -b 0 >/dev/null ACTION: Extract from byte = 0

Processing '/home/jim/work/fumes/data/galaxy_1day.json.gz' ... Processing index to 'FOOIdx.gzi'... Segmentation fault


the OS distro is recent gentoo ~2 months world build on gcc compiler -O2

the gztool binary is from clang-15 -l{z,m} -O3 -flto -ogztool gztool.c

it was also plain old gcc before with same result.

I think I could possibly get a cheap build with zig to test different libc's which would output clang but I'd need to rtfm and follow up for that.

jnorthrup commented 1 year ago

with zig we get a slightly different failure mode

jim@gentoo ~/work/gztool $ zig cc gztool.c -I /usr/include/ -l{z,m} -o gztool jim@gentoo ~/work/gztool $ ./gztool -z ~/work/fumes/data/galaxy_1day.json.gz -I FOOIdx.gzi -b 0 >/dev/null

ACTION: Extract from byte = 0

Processing '/home/jim/work/fumes/data/galaxy_1day.json.gz' ... Processing index to 'FOOIdx.gzi'... Illegal instruction

edit: same result happens without the -z

I can settle for an index with line information and -I works on my system with -b 0 so this is not a showstopper for me, but I am certainly interested in a minimal index.

circulosmeos commented 1 year ago

Hi @jnorthrup Thanks for the report!

There's a bug with -z 😮 so I'll try to release a new version soon. Until then you can manually apply this patch and recompile:


diff --git a/gztool.c b/gztool.c
index 421da54..e7fd21d 100644
--- a/gztool.c
+++ b/gztool.c
@@ -2347,14 +2347,15 @@ local struct returned_output decompress_and_build_index(
         totlines = 1;
         index = NULL;               /* will be allocated on first addpoint() */

+        index = create_empty_index();
+        if ( index == NULL ) { // Oops!?
+            ret.error = Z_MEM_ERROR;
+            goto decompress_and_build_index_error;
+        }
+
         if ( extend_index_with_lines > 0 ) {
             // mark index as index_version = 1 to store line numbers when serialize();
             // in order to do this, index must be created now (empty)
-            index = create_empty_index();
-            if ( index == NULL ) { // Oops!?
-                ret.error = Z_MEM_ERROR;
-                goto decompress_and_build_index_error;
-            }
             index->index_version = 1;
             // here extend_index_with_lines can be 3 (implicit `-x`)
             if ( extend_index_with_lines == 2 )
circulosmeos commented 1 year ago

I've released v1.5.2 in order to patch this bug.

It can be downloaded also from launchpad.net: sudo add-apt-repository ppa:roberto.s.galende/gztool sudo apt update

Let me know any problems 👍