burrito-elixir / burrito

Wrap your application in a BEAM Burrito!
MIT License
867 stars 31 forks source link

ZSTD instead of XZ? #130

Open M-Gonzalo opened 6 months ago

M-Gonzalo commented 6 months ago

Hi! This is a great, great project!

I noticed Burrito is using xz to compress its payload (lzma2 I'm guessing). It's certainly a good enough algorithm/format, with good compression and nice decompression speed, and it can be extracted basically anywhere.

There has been a shift though, in the last years, towards zstd for the same use cases one might otherwise use xz. There are many examples, one of them being the distribution format of Arch Linux packages, or squashfs images on live ISOs.

The reason/s is mainly that zstd allows for ~10x decompression speed while maintaining a competitive ratio. And, depending on the original uncompressed size, it can deliver a much better compression, using the --long option (built-in long-range deduplication), which allows it to "see" orders of magnitude more information and include it in its LZ dictionary.

The possible downside could be a slightly reduced presence of the decompressor program (especially in very old, outdated systems)

In summary, zstd will speed up the start time of a Burrito app, and probably reduce considerably its size, at the cost of a (possibly) reduced target count.

mmower commented 6 months ago

Certainly sounds like it would make a good option, where you don't think distribution to older systems is an issue.

doawoo commented 5 months ago

It looks like zig HAS a zstd decompressor already built in! (https://github.com/ziglang/zig/pull/14394)

Seems like they don't have a compressor yet oddly enough. But this is something we could work with

M-Gonzalo commented 5 months ago

It looks like zig HAS a zstd decompressor already built in! (ziglang/zig#14394)

Seems like they don't have a compressor yet oddly enough. But this is something we could work with

The compression algorithm is pretty complex and the reference implementation has all sorts of clever optimizations so everyone will probably just use that instead of rewriting. This could be an opportunity to create an integration with Elixir though, IDK if via nifs, ports, or something else. I'm just starting with the language so it's a little over my head for now but I'll look into it anyways. Maybe is not that difficult to pull off.