fragglet / lhasa

Free Software LHA implementation
http://fragglet.github.io/lhasa/
ISC License
80 stars 15 forks source link

sanitizer error at bit_stream_reader.c:93:43 #49

Closed sezero closed 1 year ago

sezero commented 1 year ago

Ran into this in one github actions run while trying to integrate liblhasa into libxmp. The data files leading to error are zipped and attached here: data_files.zip

The error is this:

bit_stream_reader.c:93:43: runtime error: shift exponent 4294967293 is too large for 32-bit type 'uint32_t' (aka 'unsigned int')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior bit_stream_reader.c:93:43 in

or:

bit_stream_reader.c:93:43: runtime error: shift exponent 4294967295 is too large for 32-bit type 'uint32_t' (aka 'unsigned int')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior bit_stream_reader.c:93:43 in

This is on x86_64-linux, clang-12, with lhasa tree compiled like this:

export CFLAGS="-fsanitize=address,undefined -fno-sanitize-recover=all -fno-sanitize=shift-base -fno-omit-frame-pointer -g"
export LDFLAGS="$CFLAGS"
export CC=clang
./configure && make
fragglet commented 1 year ago

Thanks! I will take a look

sezero commented 1 year ago

Ran into this in one github actions run while trying to integrate liblhasa into libxmp.

FWIW, the way I used liblhasa in libxmp is here: https://github.com/libxmp/libxmp/pull/664 But the error reproduces exactly the same in unmodified lhasa git tree.

fragglet commented 1 year ago

After taking a look, it looks like the problem is with how the peek_bits() function is implemented. We read up to four bytes and then shift the byte values into reader->bit_buffer which contains the bits waiting to be read from input:

// Maximum number of bytes we can fill?

fill_bytes = (32 - reader->bits) / 8;

// Read from input and fill bit_buffer.

memset(buf, 0, sizeof(buf));
bytes = reader->callback(buf, fill_bytes,
                         reader->callback_data);

// End of file?

if (bytes == 0) {
        return -1;
}

reader->bit_buffer |= (uint32_t) buf[0] << (24 - reader->bits);
reader->bit_buffer |= (uint32_t) buf[1] << (16 - reader->bits); // <- can be negative shift
reader->bit_buffer |= (uint32_t) buf[2] << (8 - reader->bits); // this too
reader->bit_buffer |= (uint32_t) buf[3];

reader->bits += bytes * 8;

When we read less than four bytes, the remaining bytes in buf are zero thanks to the memset() call. We bitwise-or in all the bytes regardless, but the remainder bytes get a nonsensical shift value because reader->bits > 16. Since the byte is always zero anyway it doesn't affect the result, but it is still undefined behavior regardless and ought to be fixed. I also don't like how there's too much "magic" happening in this function - it reaches the right result but uses some clever tricks to get there which aren't documented in the comments.

sezero commented 1 year ago

So, a lazy solution would be like this ? (Can even lose the memset I think)

diff -up bit_stream_reader.c~ bit_stream_reader.c
--- bit_stream_reader.c~
+++ bit_stream_reader.c
@@ -88,11 +88,15 @@ static int peek_bits(BitStreamReader *reader,
            return -1;
        }

+       fill_bytes = bytes;
        reader->bit_buffer |= (uint32) buf[0] << (24 - reader->bits);
+       if (! --fill_bytes) goto loc_0;
        reader->bit_buffer |= (uint32) buf[1] << (16 - reader->bits);
+       if (! --fill_bytes) goto loc_0;
        reader->bit_buffer |= (uint32) buf[2] << (8 - reader->bits);
+       if (! --fill_bytes) goto loc_0;
        reader->bit_buffer |= (uint32) buf[3];
-
+   loc_0:
        reader->bits += bytes * 8;
    }