atomicobject / heatshrink

data compression library for embedded/real-time systems
ISC License
1.31k stars 176 forks source link

Compression parameters not stored in compressed data?? #56

Open tobermory opened 4 years ago

tobermory commented 4 years ago

I built an application using heatshrink as a library to compress data on an embedded board, with following build params

DYNAMIC_ALLOC 0 WINDOW_BITS 8 LOOKAHEAD_BITS 4 USE_INDEX 0

I compress say 'seq 10000' (the Unix seq command), which is 48894 bytes, and get 27797 bytes.

I then take that that data over to x86 and decode, using the heatshrink binary as provided in the distro. I do not get 48894, instead I get 60k+. WIth the -v option, I can see the decoder is using -w 11 -l 4.

Aren't the compression parameters used to compress the data in the data itself?? It appears not. Does the decode site really have to know the encoder's parameters ??

Hopefully I made some glaring error in my workflow...

silentbicycle commented 4 years ago

Your understanding is correct, it does not include the configuration parameters in the compressed bit stream by design. Because it was written with embedded projects in mind, it takes great pains to avoid making unnecessary implementation decisions. I expect some projects will have the config hardcoded in ROM (as part of the static configuration), others will negotiate the config as part of some application-layer wire protocol, and so on. If you need it to be included in-band, it's really not that hard to write out a couple bytes before the compressed bitstream.

The command line tool pretty much just exists to experiment with with different compression parameters on representative data and to pack/unpack data on a host with an assumed config. I suppose it could have a flag to write those out before the data in some standardized way, but for the projects I've used it on, that's never been a priority. Still, that should be a fairly small change -- I'm going to be cutting a new release for a couple misc. issues in the next few weeks, so I'll probably add that.

silentbicycle commented 3 years ago

I've decided not to add the flag to write it out before, but will call attention to it in the docs.

BenBE commented 3 years ago

What about having two functions: One for reading the config from the bitstream and another for writing it into the bitstream. Neither called by default from the library itself, but as part of the setup process by the user of the library?

silentbicycle commented 3 years ago

I'm not opposed. Both could fit in a single byte. As an API change, it'd have to wait until 0.5.0.