Open jeroenvervaeke opened 1 year ago
Thanks for the report!
This looks like it's due to a few things in play here. Fundamentally though this is an issue with wit-bindgen, LLVM, and/or Rust. If you take wit-component
out of the picture the core wasm module produced by LLVM is the one that has too many locals and is causing the validation failure. The failure here is cropping up a bit late because nothing ended up validating the input prior.
On one hand this is wit-bindgen
's problem. It's generating a function so large that rustfmt
is choking on it and that's one of the issues here. The generated output should be broken up a bit to avoid blowing limits in wasm/llvm/etc. (it'd also make it much faster to compile)
On the other hand this is a Rust/LLVM issue (moreso LLVM than Rust). It's known that LLVM's wasm backend is pretty good but not optimal in a lot of regards. It's likely that this same code could be translated with far fewer locals. For example compiling with optimizations does fix things but it's still not great.
Overall, though, I think this is the right place for the issue. I don't think that wit-bindgen
should generate a 14kloc file from this input WIT file that's pretty small, and resolving that would fix the issue here.
Unfortuantely I think this is going to be a pretty difficult issue to solve. That being said it definitely needs a solution in wit-bindgen
. The problem here stems from a foundational building block of how wit-bindgen
works which is that the lowering/lifting code generated is done by walking over the "shape" of the input type. In this case it's a very large type as there's lots of recurisve fields each of which have their own recursive fields. This causes wit-bindgen
to generate an exponential amount of code, causing the locals to be exceeded.
The "fix" here I believe is to move lifting/lowering into helper functions, enabling sharing and reuse. For example loading a record from memory would be a simple delegation of loading each of its fields from memory and then placing them all in a struct
. This should generate a linear amount of code with respect to the input types rather than an exponential amount of code. Current use cases today shouldn't be hurt due to inlining and cases such as this should work naturally as a result.
The recursive nature of wit-bindgen
is currently pretty deeply baked into the lifting/lowering, however. That will need to be extracted from the wit-parser
crate most likely and refactored to support this sort of load/store pattern.
In theory there's four pieces of functionality: lifting from a flat repr, lifting from memory, lowering to a flat repr, and lowering to memory. I think the flat representations can stay as they are today though and continue to be inlined at function sites. Flat representations are only used if things "fit" which means that the real complexity comes with types stored in memory, so having helper functions for loads/stores I think is all that's necessary.
For example one change could be for wit-parser
to assume that all types have a load/store function associated with them. One of the primitive instructions would then become "load this type" or "store this type" and then that helper function would get generated on the side as necessary and would internally be defined in terms of "lift this type" or "store this type"
I'm trying to convert a component but I run into an error when running the following command:
The error is as follows:
The error is caused thrown at this line https://github.com/bytecodealliance/wasm-tools/blob/188de0fd412dd01dd41bf91c897935934282c13d/crates/wasmparser/src/validator/operators.rs#L287 and is caused by the
MAX_WASM_FUNCTION_LOCALS
which is set to50000
.When I increase the
MAX_WASM_FUNCTION_LOCALS
constant the problem goes away, but it doesn't seem right that I have more than50000
local functions.The wasm file is generated based on this wit definition: https://github.com/jeroenvervaeke/wit-issue-reproducer/blob/master/wit/fundaments_analyzer.wit
The full reproducer can be found here: https://github.com/jeroenvervaeke/wit-issue-reproducer These are the commands to run to if you want to reproduce the issue locally: https://github.com/jeroenvervaeke/wit-issue-reproducer/blob/master/build-component.sh
Do you think this is an issue with the generated WIT code, or is it expected that this many local functions are generated? If it's expected that so many local functions are generated, should
MAX_WASM_FUNCTION_LOCALS
be increased?