bytecodealliance / componentize-py

Apache License 2.0
144 stars 14 forks source link

Feature request: Smaller wasm modules / binaries. #98

Open JamesParrott opened 1 month ago

JamesParrott commented 1 month ago

Great library everyone - I'm really impressed. I reproduced the example just fine on Windows 11.

Anyway, maybe this is just the nature of Wasm Components, or it's necessary to support every possibility of a dynamic language like Python. And maybe there are settings I've not discovered yet.

But is it possible to produce a Hello World example that's much smaller than 35MB (and to reduce the size of the wasmtime host side bindings from 30MB)?

dicej commented 1 month ago

Hi @JamesParrott. Yeah, the size is annoying, I agree. You can reduce it somewhat using e.g. wasm-tools strip --all (after installing https://github.com/bytecodealliance/wasm-tools/).

Otherwise, I don't know of any great options for significantly reducing the component size and still keep it self-contained. Neither Python (as a language) nor CPython (as an implementation of the language) were designed to optimize for small, self-contained binaries.

However, if you drop the "self-contained" requirement, you have a lot more flexibility. That's what https://github.com/bytecodealliance/componentize-py/issues/28 is about. The idea is that we could upload libpython.so and the standard library to a registry which the host has access to. Then componentize-py could optionally generate components which import those things instead of bundling them, leaving only the application code (and any dependencies not available in the registry) to be bundled as part of the component.

I've been waiting for projects like warg to mature so that we actually have a registry to point to, and I know there has been recent progress on that front. I'm not sure I'll have time soon to work on #28 myself, but I'd be happy to mentor anyone who wants to tackle it.

Regarding the host side of things, are you referring to the size of the wasmtime-py package? If so, you might want to open a separate issue on that repo. I'm not sure I understand why the size of the host side bindings would be an issue, so perhaps you could elaborate on your use case?

JamesParrott commented 1 month ago

Thanks Joel. I'll give wasm-tools a try.

I've got no firm requirement. Just a dream of creating light web apps from statically typed Python code, without a O(10)MB download for Pyodide etc. I don't need all the bells and whistles, so perhaps I need more of a transpiler. Or just another language. The thing I'm currently working on is not unportable to JS.

The shared object approach is interesting - thanks. It looks similar to the repackaging of Python that distros do. This is all very new to me, but I've done a little bit with .so binaries before. I'll have a look.

Re: the host, it's not the wheel or sdist size. When running python3 -m wasmtime.bindgen app.wasm --out-dir hello_host, the resulting hello_host contains a Python package, and 40 .wasm files. root.core0.wasm is 10MB, and root.core32.wasm is 22MB.

I prefer web app bundles to be smaller than that, but others might not mind, and it's intended for different purposes. Overall componentize-py is super impressive. It's a compiler for a language that's notoriously tricky to compile.

dicej commented 1 month ago

Re: the host, it's not the wheel or sdist size. When running python3 -m wasmtime.bindgen app.wasm --out-dir hello_host, the resulting hello_host contains a Python package, and 40 .wasm files. root.core0.wasm is 10MB, and root.core32.wasm is 22MB.

Yeah, that's just the result of wasmtime-py unpacking the component into its constituent modules, so we can expect the result to be roughly the same size as the original component. I.e. if we can make the component smaller, the unpacked version will also be smaller.

I prefer web app bundles to be smaller than that, but others might not mind, and it's intended for different purposes. Overall componentize-py is super impressive. It's a compiler for a language that's notoriously tricky to compile.

Don't be too impressed -- componentize-py doesn't compile anything. It just bundles your Python code, the CPython interpreter, and any imported dependencies into a single component. There are experimental projects out there which use partial evaluation and inline caches to AOT compile (parts of) Python code, but componentize-py isn't using them yet.

JamesParrott commented 1 month ago

Oh I see - thanks for the explanation. I thought the two sizes were suspiciously similar - the original component's functionality was duplicated.

host.py and the hello_host can be moved elsewhere on the file system, and python host.py still works as long as wasmtime is installed. That's nice and portable, and about the same size as Pyodide.

componentize-py sounds more similar to Pyodide than I realised, from your description, but without Emscripten :) .