atomicobject / heatshrink

data compression library for embedded/real-time systems
ISC License
1.31k stars 176 forks source link

Fix encoder bug when finishing #87

Open sameer opened 3 months ago

sameer commented 3 months ago

Via some fuzzing, I found a finish scenario that can result in incorrect encoding. This PR addresses the bug by adding a check for finishing && input size == 0 in the search step to transition to flush bits.

Explanation:

At finish time, if the input_size is 0 then there is no more data to encode and the remaining bits should be flushed. However, because the search step does input_size - 1 this results in an overflow and the search step transitions to yielding a tag bit instead. I'm not really sure what happens from this point on, but I suspect it may loop infinitely.

BenBE commented 3 months ago

This kinda sounds like a bug I encountered a few months back, that was triggered, when the following set of conditions were met:

  1. The output buffer was full
  2. The output of the last literal was mid-way (in my case first bit of last literal value written, 7 more for the literal to go that didn't fit)
  3. No further bytes in the input buffer
  4. The encoder trying to finish up

Due to some API inconsistency, the return value was EMPTY instead of OUTPUT_MORE, thus my driver code for compression ended up inadvertently dropping the last byte of the output. It didn't cause an infinite loop, but rather an early termination of the encoder. The example implementation of the CLI utility is not affected; and also was, what in the end allowed me to locate my encoder bug.

sameer commented 3 months ago

It didn't cause an infinite loop, but rather an early termination of the encoder.

That's the behavior I noticed in a Rust port I'm writing fix ref. But I wasn't totally convinced there was no loop in the C implementation here.

silentbicycle commented 3 months ago

Thanks. I'll make sure this is fixed in the next release.

trueb2 commented 1 month ago

I also encountered this during some fuzzing on a Rust port. My st_step_search impl looks like this now

fn st_step_search(&mut self) -> HSEState {
    let window_length = self.input_buffer_size;
    let lookahead_sz = self.lookahead_size;
    let msi = self.match_scan_index;

    if msi > self.input_size.saturating_sub(lookahead_sz) {
        return HSEState::SaveBacklog;
    } else if unlikely(self.is_finishing()) && msi >= self.input_size {
        return HSEState::FlushBits;
    }

    let input_offset = self.get_input_offset();
    let end = input_offset + msi;
    let start = end - window_length;

...