atomicobject / heatshrink

data compression library for embedded/real-time systems
ISC License
1.31k stars 176 forks source link

heatshrink with -w 15 and HEATSHRINK_USE_INDEX=1 loops infinitely for >= 32768 byte file #55

Open ecm-pushbx opened 4 years ago

ecm-pushbx commented 4 years ago

This is with revision v0.4.1-1-g7d419e1.

$ make clean
[...]
$ make OPTIMIZE="-O0"
[...]
$ ii=32768; while ((ii--)); do printf "\x90"; done > 90.bin
$ ./heatshrink -w 15 90.bin tmp
^C

Using -w 14 or lower, or building with HEATSHRINK_USE_INDEX set to zero, works around this bug.

It seems to be that in the function find_longest_match the while loop loops forever, because pos = 0, match_maxlen = 0, pospoint[match_maxlen] = pospoint[0] = 0, needlepoint[match_maxlen] = needlepoint[0] = 0x90 (presumably a byte (first one?) from the input file), hsi->index[pos] = hsi->index[0] = 0. So the first if in that while always is evaluated as true, pos gets reset to hsi->index[pos] (both are zero), and the loop continues looping.

Here's a gdb session showing the behaviour:

$ gdb --args ./heatshrink -w 15 90.bin tmp
GNU gdb (Debian 8.3.1-1) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./heatshrink...
(gdb) break heatshrink_encoder.c:466
Breakpoint 1 at 0x2dee: file heatshrink_encoder.c, line 466.
(gdb) run
Starting program: /media/ssd-data/coding/proj/heatshrink/broken/heatshrink -w 15 90.bin tmp

Breakpoint 1, find_longest_match (hse=0x55555557c300, start=0, end=32768, 
    maxlen=16, match_length=0x7fffffffcbdc) at heatshrink_encoder.c:466
466     while (pos - (int16_t)start >= 0) {
(gdb) step
467         uint8_t * const pospoint = &buf[pos];
(gdb) 
468         len = 0;
(gdb) 
473         if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
(gdb) print/x pospoint[match_maxlen]
$1 = 0x0
(gdb) print/x needlepoint[match_maxlen]
$2 = 0x90
(gdb) step
474             pos = hsi->index[pos];
(gdb) 
475             continue;
(gdb) print/x pos
$3 = 0x0
(gdb) display/x pos
1: /x pos = 0x0
(gdb) step
466     while (pos - (int16_t)start >= 0) {
1: /x pos = 0x0
(gdb) 
467         uint8_t * const pospoint = &buf[pos];
1: /x pos = 0x0
(gdb) 
468         len = 0;
1: /x pos = 0x0
(gdb) 
473         if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
1: /x pos = 0x0
(gdb) print/x pospoint[match_maxlen]
$4 = 0x0
(gdb) print/x needlepoint[match_maxlen]
$5 = 0x90
(gdb) step
474             pos = hsi->index[pos];
1: /x pos = 0x0
(gdb) display/x hsi->index[pos]
2: /x hsi->index[pos] = 0x0
(gdb) step
475             continue;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
466     while (pos - (int16_t)start >= 0) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
467         uint8_t * const pospoint = &buf[pos];
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
468         len = 0;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
473         if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
474             pos = hsi->index[pos];
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
475             continue;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
466     while (pos - (int16_t)start >= 0) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
467         uint8_t * const pospoint = &buf[pos];
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
468         len = 0;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
473         if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) display/x match_maxlen
3: /x match_maxlen = 0x0
(gdb) 
ecm-pushbx commented 4 years ago

This patch seems to fix it, but I'm not sure whether it is correct. It just fixes the infinite looping by checking that it won't loop with the same pos value.

$ git diff
diff --git a/heatshrink_encoder.c b/heatshrink_encoder.c
index edf4abe..be216ee 100644
--- a/heatshrink_encoder.c
+++ b/heatshrink_encoder.c
@@ -471,8 +471,10 @@ static uint16_t find_longest_match(heatshrink_encoder *hse, uint16_t start,
          * This is redundant with the index if match_maxlen is 0, but the
          * added branch overhead to check if it == 0 seems to be worse. */
         if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
-            pos = hsi->index[pos];
-            continue;
+            if (pos != hsi->index[pos]) {
+                pos = hsi->index[pos];
+                continue;
+            }
         }

         for (len = 1; len < maxlen; len++) {
@@ -484,6 +486,9 @@ static uint16_t find_longest_match(heatshrink_encoder *hse, uint16_t start,
             match_index = pos;
             if (len == maxlen) { break; } /* won't find better */
         }
+        if (pos == hsi->index[pos]) {
+            break;
+        }
         pos = hsi->index[pos];
     }
 #else   
ecm-pushbx commented 4 years ago

Curiously, -w 15 almost always ends up with negative compression for me (ie, expanding the output file to be larger than the input file). So for now I just exclude -w 15 from my scripts that determine the best compression parameters.

silentbicycle commented 4 years ago

Thanks for your detailed bug report! This will be fixed in the next release.

I wouldn't expect -w 15 to compress effectively, except in very contrived circumstances.

ecm-pushbx commented 1 year ago

Do you still intend to create a new revision of the library? I ask because I just recommended it to someone else on the freedos-devel mailing list and I stumbled across the allowed parameter ranges being larger than my script used to try. (And a -w parameter below 10 often wins out for the lDebug online help pages, as these are all below 3 KiB each.) That led me to rediscovering the -w 15 bug on large files.

silentbicycle commented 12 months ago

Yes. It will probably be a few weeks before I get to it because I'm currently busy with moving, but I currently plan to tie up some loose ends and post another release by the end of the year.