emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.72k stars 3.3k forks source link

lz4 compressed wasm with ``-s SINGLE_FILE=1`` #16056

Open ekpyron opened 2 years ago

ekpyron commented 2 years ago

It would be cool, if emscripten provided the option to compress a wasm binary with LZ4 before base64-encoding and embedding it via -s SINGLE_FILE=1.

Context: The solidity compiler https://github.com/ethereum/solidity has been publishing js/wasm binaries built with -s SINGLE_FILE=1 for a while now. However, the binaries have grown to over 20MB in size and require a lot of bandwidth and long loading times. We realized that lz4 compressing the wasm binary before base64 encoding will reduce the size to ~8MB and loading times actually become faster, since base64 decoding is slow in comparison. So we're working on doing this manually, but we were wondering, if this may be something that's generally useful and may be a nice feature in emscripten itself. (Especially since we realized that you support something very similar with LZ4 = 1 already, just not applied to the embedded wasm binary)

sbc100 commented 2 years ago

Sounds like a great idea.

Out of interest, why you can't/don't you ship the wasm as a separate file?

ekpyron commented 2 years ago

Mainly because we had been shipping asm.js binaries for years and have downstream tooling that expects a single file, as well as some infrastructure that assumes a single file... so since staying with a single file pretty much made it a drop-in replacement, we went for that at first and just stayed with it. We might move to separate files eventually, but there never were really strong reasons for it. Binary size (which is of course always inflated by the base64 encoding) used to be less of an issue, but then we started linking against z3, which increased it by an order of magnitude. Ideally, we would probably indeed use a separate wasm file and have it compressed in transit, but transparent http compression is apparently often restricted to <10MB by server infrastructure and we're above that... So if we do nothing we have ~27MB to transfer, if we split into separate files we still have ~20MB, if we don't split and lz4-compress we have ~8MB, which will be below 10MB and can be further compressed in transit to ~6MB :-). And since even for an already downloaded local copy lz4 compression actually improves loading time (nominally at least, there's no huge difference) and staying with a single file means we don't require any downstream changes, that seems like a good option :-).

sbc100 commented 2 years ago

There are a couple of reasons why you might still want to consider using a separate wasm file. Firstly, you get streaming compilation, which can speed up your load time because the module is being compiled as it comes over the wire. Secondly, you can potentially get caching of the compiled wasm code.

For these reasons we tend to recommend the more standard usage over the SINGLE_FILE solution... and I'm somewhat hesitant to invest more in SINGLE_FILE. Having said that, you do make some good points. wdyt @kripken ?

kripken commented 2 years ago

I don't know enough about this use case in particular, but in general I know that some places do just want a single file, either for policy reasons or other things, I've heard this numerous times. So I think it's a useful option.

But I do agree it's very low priority to improve code size in this case. But if someone wants to write a PR that would be ok by me.