MathisRosenhauer / libaec

libaec - Adaptive Entropy Coding library
https://gitlab.dkrz.de/k202009/libaec
BSD 2-Clause "Simplified" License
12 stars 9 forks source link

preprocess_signed UndefinedBehaviorSanitizer: signed-integer-overflow third_party/libaec/src/encode.c:297 #6

Closed schwehr closed 6 years ago

schwehr commented 6 years ago
third_party/libaec/src/encode.c:297:38: runtime error: signed integer overflow: 2139750481 - -1073741824 cannot be represented in type 'int'
    #0 0x55edf610998b in preprocess_signed third_party/libaec/src/encode.c:297:38
    #1 0x55edf610b9cd in m_get_rsi_resumable third_party/libaec/src/encode.c:693:9
    #2 0x55edf610a6d5 in aec_encode third_party/libaec/src/encode.c:910:12
    #3 0x55edf610aa9a in aec_buffer_encode third_party/libaec/src/encode.c:942:14
    #4 0x55edf60fea12 in LLVMFuzzerTestOneInput third_party/libaec/fuzzing/fuzz_target.cc:29:9

SUMMARY: UndefinedBehaviorSanitizer: signed-integer-overflow third_party/libaec/src/encode.c:297:38 in 
MS: 0 ; base unit: 0000000000000000000000000000000000000000
0xff,0xff,0xff,0xff,0xff,0xff,0xff,0x8a,0x0,0x51,
\xff\xff\xff\xff\xff\xff\xff\x8a\x00Q
artifact_prefix='./'; Test unit written to ./crash-5bcc933627366e53c794add895cf16bede1c33da
Base64: /////////4oAUQ==

Happens in here:

static void preprocess_signed(struct aec_stream *strm)
{
    /**
       Preprocess RSI of signed samples.
    */

    uint32_t D;
    struct internal_state *state = strm->state;
    int32_t *restrict x = (int32_t *)state->data_raw;
    uint32_t *restrict d = state->data_pp;
    int32_t xmax = (int32_t)state->xmax;
    int32_t xmin = (int32_t)state->xmin;
    uint32_t rsi = strm->rsi * strm->block_size - 1;
    uint32_t m = UINT64_C(1) << (strm->bits_per_sample - 1);

    state->ref = 1;
    state->ref_sample = x[0];
    d[0] = 0;
    x[0] = (x[0] ^ m) - m;

    for (size_t i = 0; i < rsi; i++) {
        x[i + 1] = (x[i + 1] ^ m) - m;
        if (x[i + 1] < x[i]) {
            D = (uint32_t)(x[i] - x[i + 1]); // signed int overflow: 2139750481 - -1073741824
            if (D <= (uint32_t)(xmax - x[i]))
                d[i + 1] = 2 * D - 1;
            else
                d[i + 1] = xmax - x[i + 1];
        } else {
            D = (uint32_t)(x[i + 1] - x[i]);  // likely also to be trouble
            if (D <= (uint32_t)(x[i] - xmin))
                d[i + 1] = 2 * D;
            else
                d[i + 1] = x[i + 1] - xmin;
        }
    }
    state->uncomp_len = (strm->block_size - 1) * strm->bits_per_sample;
}

I'm not sure how an overflow situation should be handled. I'm guessing that there was an assumption at a higher level that was violated. If not, INT32_MAX and INT32_MIN are useful to create safe math functions that protect against signed overflow (which is very definitely undefined behavior in C).

https://en.cppreference.com/w/c/types/integer

MathisRosenhauer commented 6 years ago

I have seen this as well when running ubsan. A similar thing happens when decoding signed ints. I will see how to fix this without taking too big of a speed hit.

schwehr commented 6 years ago

Can you describe in prose what that function is doing? Is there a precondition check that can be done? Or Maybe there is an alternative approach that isn't susceptible to overflow? Can it be converted to unsigned math or can it switch to 64 bit ints? Often 64 bit math is just as fast. This begs the question of setting up some microbenchmarks to make it easier to check patches for large performance regressions. But, from my perspective, avoiding UB wins out over performance if there isn't an alternative that is fast.

MathisRosenhauer commented 6 years ago

It is the implementation of preprocessing as described in section 4 of the standard. It maps the difference to the previous sample to an unsigned int. Doing everything in 64 bit helps but had a very noticeable speed impact last time I checked. This is easily the hottest function for encoding. make bench will show that but you have to switch to signed encoding (aec -s ... in benc.sh) for this case. That being said, I totally agree with you that UB should be avoided. Even though make test will fail if the UB is not what we expect. I will have a look at it next week :smiley: