atomicobject / heatshrink

data compression library for embedded/real-time systems
ISC License
1.31k stars 176 forks source link

Restart decompression #85

Closed karadokk closed 4 months ago

karadokk commented 5 months ago

Hello,

I am using heatshrink to decompress on-the-fly a file, I try to implement a progressive download but if the device reboot (it is an embedded device), I lose the decompression buffer. I did some tests and found out that I needs to store: _headindex, state, _currentbyte, _bitindex and buffers (from the structure _heatshrinkdecoder) to restart where I left off.

However I don’t really understand how buffers is built and I would like to know if there is a way to rebuild it instead of having to store it.

If it is not possible, is there some sync point where I can restart the decompression? I am using a window size of 9 and a look ahead of 3.

silentbicycle commented 5 months ago

If you're running into problems with a long decompression getting interrupted you may be able to break your input into multiple chunks (compressed separately) and restart from the beginning of the last one that didn't complete? It's common to need some kind of sync point/resume behavior, but the best approach can vary a lot from project to project and with the kind of data being processed, so heatshrink doesn't commit to any particular method.

buffers is filled in by the decompression process. Essentially, it's a storing a fixed-size window of recent decompressed output, and the compressed input contains either data to output as-is or (offset, length) pairs pointing back into the recent output -- either "[some literal bytes]" or "and then repeat the 10 bytes the output had starting 50 bytes ago". As more output is produced it overwrites the buffer, so it's always able to point back into recent output for instances of repeated patterns, and the backwards references use less space when those patterns are at least a few bytes long.

heatshrink is an implementation of LZSS, largely as described in the wikipedia page. Most of the complexity in the implementation is just to support suspending/resuming to do work in small steps, interleaving (de)compression with other IO.

karadokk commented 4 months ago

Hello, thank you for answering me, so I added sync points by myself during the download. Sorry for the delay in my response.