Open alexcrichton opened 3 months ago
Yes we were aware of this slightly strange fallout of enabled reference-types.
For example we had (within Google) reports from folks using wamr that led to https://github.com/bytecodealliance/wasm-micro-runtime/pull/3726.
Its a rather odd side effect of including multi-table as part of reference-types.
Using a single byte relocation for the call_indirect immediate would be a fairly big change since it would be the first relocation that was anything but 5 bytes. Are you sure its worth this effort? I don't think the goal of llvm or wasm-ld should be to produce MVP output unless --cpu=mvp
is used.
Isn't this just a one time cost for all tools to update their defaults to match llvm? At least for binaryen and emscripten we try to keep all our defaults in line with llvm's defaults so everyone is on the same page. I imagine each time llvm changes its defaults there will be such one-time costs for all downstream tools, no?
Also, just in case you were not aware, the linker has --compress-relocations
flag that can be used to compress all these 5-byte LEBS.
Also, any optimized build that goes through a tool like wasm-opt will also no longer have these non-canonical LEBs. Just in case that helps anyone who might come accross this issue.
+cc @sunfishcode Yeah, generally speaking I'm sympathetic to this... for example when this change landed we had to do an update in our CI because the version of node we were using didn't support retypes out of the box. I think it's actually worth separating the question of enabling features by default generally from reference types specifically.
For features generally, I don't actually agree that we want to only generate MVP modules unless newer features are requested. Reference types is maybe a special case (since most C/Rust style code doesn't benefit from it) but a better example is nontrapping float-to-int conversion. Enabling this by default will make basically nontrivial program (at least C programs, not sure about Rust) a bit smaller and maybe faster, and there's really no downside to enabling it. Given that many users will not want to go through the list of all features and try to reason out which ones they explicitly enable (and therefore many will just use the default plus whatever is actually necessary) I think we should, as a general practice, enable most features by default once support has reached some threshold. Exactly what the criteria should be (for support levels in the wild, benefits of the feature, etc) could be debatable, but I think the "default" practice should be to enable things unless there's a reason not to.
Reference types specifically could maybe be a special case. It's true that C/Rust style code normally doesn't need it. You could imagine a solution like the one you mentioned; this seems sort of analogous to the small vs large code models supported on some architectures, where there are different size limitations on the resulting code in exchange for more efficient code generation in some cases. 127 tables does seem like enough for most purposes. Probably we just didn't think about this when we made the original design (maybe @pmatos remembers?)
FWIW, I don't think we expected the call_indirect issue before we made the change, but I don't think we would have done much differently if we had anticipated it (except perhaps communicate more broadly ahead of the change). We certainly would have wanted to make the change eventually for the reasons @dschuff gave, and reference types have been standard for long enough now that I don't think waiting longer would have materially changed the outcome.
I'm nervous about the idea of capping the number of supported tables at 127. That's certainly sufficient today, but we've had all kinds of ideas about using separate tables for separate function signatures for various purposes and I wouldn't want to create a new problem for those ideas to run into in the future.
Also it's possible that we may eventually want to enable exception handling by default for C++ (or even for Rust panics?), and that will require reference types.
That's a good point @dschuff and I should clarify that I do not want to shackle LLVM or anyone else to the wasm MVP, I think it's very good to enable new proposals by default as things move forward. I can perhaps clarify my stance though in that if a selected operation is MVP compatible I think it's beneficial to have it encoded in a way that's MVP compatible. For example a nontrapping float-to-int conversion is not MVP compatible so there's nothing that can be done about that. For call_indirect
in this case though there is something that can be done which is to encode the leb differently. Put another way I do think it's a nice property that the "AST of a wasm module" (or fragment) is always encoded in the oldest supported encoding for better compatibility.
Isn't this just a one time cost for all tools to update their defaults to match llvm?
The main exception to this I can think of is build configuration. We support building Wasmtime without a gc
feature turned on (a Rust-level Cargo feature, not to be confused with the wasm feature name) where if you disable Wasmtime's gc
comiple-time Cargo feature it compiles out support for reference-types, function-references, and gc. This is done to create more minially-sized builds of Wasmtime. Previously that would accept modules from LLVM compiled from C/Rust but that no longer works due to this issue. The reference-types
proposal must be enabled to accept these modules which means it must be enabled at compile time, bringing in more runtime dependencies.
Now this itself can be split into a number of different possible solutions:
reference-types
. Then Wasmtime in non-gc mode would still reject modules with externref
but would support modules from LLVM with overlong lebs in call_indirect
reference-types
-the-proposal but reject instances of externref
.None of these seem like the obvious solution per-se, but I also suspect that Wasmtime isn't the only runtime that may want to indefinitely provide a build mode where GC and/or externref
isn't supported.
For the leb/relocation limits, my rough idea is that a new 1-byte relocation would be added for table indices. I realize that would be quite different from what's implemented today, though. I forgot though that LLD supported --compress-relocations
. If that's supported then does that mean LLD could support growing relocations as well? For example could a relocation start as a single byte but if needed LLD could grow it up to 5 bytes? That would solve the issue of not actually limiting to 127 tables while still benefiting the most common use case of there's only a single table.
Also it's possible that we may eventually want to enable exception handling by default for C++ (or even for Rust panics?), and that will require reference types.
Good point! (and agreed that Rust will want to enable it by default one day too)
IIRC though the exceptions proposal doesn't require at least the GC/externref
part of reference-types
, so that sort of also points towards the split I was mentioning above of supporting a mode in Wasmtime of reference-types
-but-no-externref
or something like that.
I agree with the sentiment that this is a specific case, and shouldn't generalize to every time any proposal is enabled by default in LLVM.
I think what makes this case more complicated is that the user's source and LLVM's emitted modules aren't actually leveraging any new features unlocked by the newly-default-enabled proposal. The only results in this particular case are worse code size and less portability.
If LLVM were taking advantage of newly-unlocked features and using externref
s in the emitted module, the calculus would be very different. (For example, enabling C++ exceptions and Rust unwinding by default; but also I'd also expect that if I built a plain C program that didn't use setjmp/longjmp then I wouldn't have anything in the wasm binary that depended on exceptions).
Similarly, I would expect that in the far future when the custom-page-sizes proposal has been finished and merged into the core spec, wasm-ld
wouldn't add the binary-encoding equivalent of (pagesize 64KiB)
to every module it links together that uses the default page size. I'd expect it to simply rely on the defaults when possible, and only emit the annotation when a non-default page size is used.
wasm-ld wouldn't add the binary-encoding equivalent of (pagesize 64KiB) to every module it links together that uses the default page size
I agree with this, but in this case it is not the linker that is encoding the call_indirect immediate but the compiler. When reference types are enabled the compiler doesn't know if some other translation unit will add additional tables so it has to do the conservative thing and add a relocation. All the linker does is apply the relocations that the compiler generates.
I don't think the code size argument applies here since optimized binary should already be running though something like wasm-opt
or built with --compress-relocations
. Binaries that don't do this are already bloated with non-canonical LEBs.
I think adding a new single-byte relocation would be easier than trying to do something like compress-relocations but in reverse. --compress-relocations
breaks debug info, so it only works on release builds.
The single-byte relocation type might not be so bad. If anyone tries to use more than 127 tables the linker could emit a meaningful error such as "obj.o does not support more than 127 tables. Please recompile with with -msomething`.
We could also use the same default for multi-memory where the number of memory relocations will likely be larger by an order of magnitude and the number of memories likely always very small.
In order words, I'm coming around to the single-byte relocation idea..
Oh I didn't realize --compress-relocations
would break debuginfo. That's a strong reason to not switch everything to 1-byte-but-can-expand relocations. I think it would be reasonable to have a -msomething
flag for "use a bigger relocation here" and I think that could perhaps be turned on by default for a hypothetical future with a table-per-function-type too.
I assume the idea would be to backport this change to the LLVM 19 branch?
If it's reasonable enough to do so that'd be great, but if it's large enough and/or risky enough I don't think it's the end of the world to update this in LLVM 20 instead
In general, it would also be a good idea for engines to support encoding changes introduced by new proposals, even if they are configured not to support the new features and semantics of those proposals.
In general, it would also be a good idea for engines to support encoding changes introduced by new proposals, even if they are configured not to support the new features and semantics of those proposals.
I was wondering about that.. this would be an nice solution, but can you explain why you think its a good idea?
I suppose this would also involve updating the core spec tests to match pending/new proposals in these respects?
To be clear, I'm only talking about proposals that have become part of the standard. The core spec tests need to be updated for such proposals anyway. Accepting new encodings even without accepting new features 1) prevents problems like this one, and 2) simplifies decoders that no longer need to have different behavior in different configurations.
Well updating wasmtime to unconditionally accept LEB encoding for this immediate certainly seems like the simplest solution then.
Currently both wabt and wamr gate the encoding change on reftypes being enabled, but if it makes since to make this unconditional that is cleaner and more compatible.
IIRC, there have been assert_malformed
tests in the main test suite that will fail if you made this sort of change because a proposal widened what encodings were acceptable.
If we wait to make this sort of change until a proposal is merged into the main spec, that is one way to work around the issue.
Or we could adopt a policy in the CG to avoid writing such tests and change/fix them where they do exist.
I doubt we would ever consider turning on a feature by default in LLVM before it was fully merged in the main spec repo. I think we can expect a substantial lag between stage 4 approval and LLVM enabling a feature by default.
What Nick brings up though isn't the only problem with newer proposals. One issue is that while a proposal is being developed, as he mentioned, there are often contradictory tests where the main test suite says "this must be invalid" where a proposal then says "those same bytes must be valid". This means that at the binary decoding layer we have to have some sort of conditional, so that's how proposals are all developed and everything starts out as these sort of conditional branches in the decoder.
Later on after the feature has been merged into the spec then the question is whether the feature flag should be removed. So far Wasmtime/wasm-tools haven't removed any feature flags and support disabling features to go back to the MVP for example. This is not necessarily generally useful to everyone but certain embedders have expressed a desire to be able to do this indefinitely. For example some embedders want to be able to always disable the simd proposal. Wasmtime's build configuration requires being able to disable at least the extenref
type in the reference-types
proposal. I vaguely recall there being other motivations along the way but I can't find their links at this time.
Basically, at least for Wasmtime/wasm-tools, it so far hasn't been a safe assumption that phase 4+ proposals are unconditionally enabled and can't be disabled. If that were the case this would be less of an issue for Wasmtime, but there's still motivation to turn off different features of WebAssembly.
Another possible very-abstract solution would be the "profiles" option for wasm where the GC profile for wasm would probably include externref
but probably wouldn't toggle the multi-table aspect (and thus the leb-encoded index in call_indirect
). That would be a more natural target for Wasmtime embedders disabling gc, and then users who specifically want to emulate older versions of the specification might be required to use older versions of wasm-tools or understand that some binary encodings might be accepted when they were historically rejected. That's all sort of hand-wavy though and I realize that profiles are not exactly just around the corner.
I'm not suggesting that feature flags be removed, but only that the decoder (and no other part of the codebase) behave as though standardized features are always enabled. It's a good point that the decoder would still need feature flags to avoid breaking spec tests while supporting proposals that are not yet standard, though. In that case, I would be happy to update the upstream spec tests to remove the contradiction. Alternatively, the feature flag guards in the decoder could be removed once the proposal reaches phase 4.
In wasm-tools it actually used to do that but was relatively recently modified. If spec tests were updated so they never contradicted between proposals I agree that would remove the need for such a change though. That being said I might have less confidence than you that such a change to how spec tests are written would be widely agreed upon, but I'd definitely be willing to voice my opinion in favor of such a change!
If it helps, these are the issues that arise when binary decoding assumes all features are active:
This issue is then replicated across the extended-const, function-references, gc, and memory64 repositories which have the same test in their globals.wast
.
I was actually surprised that this was the only issue. Historically multi-memory and reference-types caused a lot of similar issues. The reference-types proposal has long since merged, but I would have thought that multi-memory would still be causing issues, but apparently not!
FWIW, we're about to start importing the upstream spec tests into Binaryen as a submodule, so we'll start running into the same issues.
Another way you can minimize pain here is by ignoring the error message text in the assertions; that text matches the errors from the spec interpreter, but there's no requirement that other engines produce the same error messages.
Yeah, the encoding thing is an interesting problem. Generally speaking I do like the idea of engines expanding the allowable encodings even if a feature itself is disabled. It doesn't necessarily solve all the problems (e.g. there could be engines that just don't support a new feature at all; tools might still want a way to disable use of a feature), but would solve a class of issues once a feature becomes standard). And for a tool like wabt or wasm-tools, there is pretty limited value in emulating a mode where a feature is unsupported (compared to the value of just letting a decoder decode all the features it knows about).
WRT how it's speced, part of this problem does go away when an extension gets merged into the spec, because the spec assumes that both the new encoding and the new functionality will be unconditionally valid, and currently doesn't have profiles or a similar way to reason about features being enabled or disabled. So IIUC there couldn't be an assert_malformed
test to guarantee that a 5-byte encoding for a table in a call_indirect
instruction in the current spec because ref types are just unconditionally "enabled". So it falls to tools and embedders to figure out what to do.
I don't necessarily expect profiles to solve this entirely either. Some in the CG have strong opinions that there should be very few profiles (fewer than the variety of embedders in the world will inevitably want, I suspect). So I suspect that there will be feature combinations that will exist outside of the specified set of profiles, and it will just fall to us (with our software-maintainer hats on) to decide what to do just as it does today. And there may always be case-by-case issues. But this seems a pretty reasonable principle at least.
Just wanna note that the spec requires all implementations to support this, and has for years now, full stop.
Either we call out implementations that don't follow this as buggy and make it their problem. Or we don't, but then the only way of doing so would be by officially introducing suitable profiles.
If including reference-types feature in the build increases code size in wasm-tools, I agree w/ @dschuff that it can be a good idea that wasm-tools unconditionally decodes the call_indirect
index as a LEB. Can you check the feature flags to see if reference-types is enabled without including reference-types feature in the build? If that condition check costs time, I think there is little to lose if you just decode it as LEB unconditionally, even in MVP. The downside of that would be we can't signal a validation error when an MVP module incorrectly encodes the index as an LEB, but... What would the odds of that be? Also, as @sbc100 said, this is not hurting code size; every LEB is encoded as 5 bytes initially and you need to run some optimization to compress that in release mode anyway.
I'm little reserved about the idea of making the relocation 1-byte from LLVM side and limit it to 127 tables based on the similar reasons @tlively suggested. Also as @rossberg said, eventually, the engines need to support this at some point. I don't think we are going to enable the new EH proposal by default in LLVM soon; given that it was standardized very recently we would likely wait for at least a few years before doing that. But it is possible that Rust or other toolchains would like to opt in to the feature before that because it gives better code size and performance, in which case the engines needs to support the LEB index anyway.
Doesn't the linker already support mixing object files without relocations for call_indirect
and those with them?
In other words, what would be the cost of hard-coding the indirect function table to have index 0 always, in both the linker and the compiler?
Edit: fwiw, this would be a real size regression in the default configuration (which enables debug info) in our toolchain.
Doesn't the linker already support mixing object files without relocations for
call_indirect
and those with them?In other words, what would be the cost of hard-coding the indirect function table to have index 0 always, in both the linker and the compiler?
Edit: fwiw, this would be a real size regression in the default configuration (which enables debug info) in our toolchain.
I had hoped this would be possible, but sadly the indirect function table index cannot be assumed to be at index zero. The reason is that imported tables always come before defined tables in the index space. So any program that imports a table for any kind will cause the (defined) function index table to have a non-zero index itself.
In other words, you can't both import a table and have a defined table at index zero.
The reason is that imported tables always come before defined tables in the index space
Hmm, of course (obvious in retrospect :)).
This also means that all object files built without reference types enabled (or ~= without table index relocations) are incompatible with links that have them enabled. It seems like quite a big breaking change, ameliorated perhaps somewhat by the rarity of imported tables.
Hi @aheejin
I am working on Near blockchain sdk. Which uses WebAssembly, compiled from Rust to execute smart contracts. reference-types feature breaks the smart contract execution due to deserialization error. So the developers can't compile their smart contracts with new version of Rust compiler.
One of the possible solutions is to disable reference-types. However, as it requires to build standard library as well, it forces to use nightly compiler. Which is not an option as developers usually build with stable compilers.
I am wondering, is it possible to build standard library as well on stable compiler? Which solution would you suggest?
@PolyProgrammist you might be interested in this post which explains more about this change and has a section about disabling features with respect to recompiling the standard library. Recompiling the standard library is not possible on stable Rust but you might be interested in this proposal for a new Rust target where the standard library is built with minimal features.
@PolyProgrammist I'm a little unclear about what the issue is with the stable vs nightly compiler. Are there LLVM changes post v19 that fix a problem? If they are small enough, merging them to the 19 branch might be an option.
@PolyProgrammist I'm a little unclear about what the issue is with the stable vs nightly compiler. Are there LLVM changes post v19 that fix a problem? If they are small enough, merging them to the 19 branch might be an option.
The problem is that I don't see a way to disable reference-types feature on stable rust compiler starting from 1.82.
I am working on Near blockchain sdk. Which uses WebAssembly, compiled from Rust to execute smart contracts. reference-types feature breaks the smart contract execution due to deserialization error. So the developers can't compile their smart contracts with new version of Rust compiler.
Have you looked into other possible solutions here:
(Obviously being able to disable reference types is important, in general, but these two should also both address your issue).
The problem turns out to be bigger than just the runtime mentioned here. Major JavaScript bundlers such as webpack can no longer handle anything produced by LLVM 19 (be it Rust, C, C++, ...).
@CryZe posted more detail in another thread
Some feedback: This broke the majority of the JavaScript bundlers (webpack and co.) which are commonly built on top of webassemblyjs which is very unmaintained and does not support reference-types. I'm not sure, but it may be worth reverting.
My response was:
Can you say more about what the bundlers are doing, and how they are doing it? I don't want to be breaking people, but also we surely can't freeze LLVM forever just because some other tool is unmaintained. We are actually planning to enable bulk memory and nontrapping-fptoint soon as well.
Interestingly, it looks like the decoder in webassemblyjs already decodes the table index as an LEB.
Continuing from the LLVM PR:
From what I've seen the webassemblyjs
project (that backs webpack) is actually very agnostic to the actual instructions. I'm not sure how it does that, but I've successfully been able to use webpack with basically every wasm feature that LLVM implements other than reference-types
. I know that it can't handle the actual "externref" types, but I don't yet know what problem in particular the LLVM 19 bump caused here (if it's not the table index).
Though I think this may probably be an easy enough PR that I can look into it (as opposed to implementing the full proposal right away).
Yeah as far as I know, the only use of anything from the reference types proposal that you get by default with LLVM builds is the long encoding of the table index as described upthread. The link from my previous post goes (I think) to the part of the decoder that would have to be updated but it appears that it would already handle the LEB encoding (however, that was based on 5 minutes of looking, so I could well be wrong about that). So it would be good to check for more details. Also, could you maybe try it out with bulk-memory and nontrapping-fptoint enabled (-mbulk-memory
and -mnontrapping-fptoint
flags respectively)
Also, could you maybe try it out with bulk-memory and nontrapping-fptoint enabled (-mbulk-memory and -mnontrapping-fptoint flags respectively)
I already have all the following enabled and they work perfectly fine: +bulk-memory,+mutable-globals,+nontrapping-fptoint,+sign-ext,+simd128,+extended-const,+multivalue,+relaxed-simd,+tail-call
And even the experimental-mv
ABI.
(Despite some of them having miscompilation issues in LLVM, but I fortunately have not encountered any miscompilations in my code for those)
Turns out it's an unfortunate interaction where wasm-bindgen
(Rust's JS shim generator) looks at the feature section of the module and now assumes that you want full reference types support and rewrites the wasm module to fully make use of them... at which point the webassemblyjs
dependency of webpack
fails to handle it. I think here we can likely just fix this quickly in wasm-bindgen
in the short term and do a proper PR to implement the full proposal in webassemblyjs
longer term.
Interesting. AFAIK the feature section is really only useful for the linker (at least, I'm not aware of other uses of it). Maybe we should strip it out after linking? Or is this an object file that is being processed by webassemblyjs?
Interesting. AFAIK the feature section is really only useful for the linker (at least, I'm not aware of other uses of it). Maybe we should strip it out after linking? Or is this an object file that is being processed by webassemblyjs?
I think it is also designed for post-link tools such as wasm-opt and wasm-bindgen to use, no?
I guess so, if wasm-bindgen has a whole separate mode for generating bindings based on the features :) It seems like we may want some tool to be a little smarter, to distinguish the case where we actually reference types or multiple tables in the final binary from the one where we just had relocatable call_indirect encodings. Maybe if there are no ref types in the type section and only one table after linking, we could drop the feature. (and of course as mentioned upthread, running wasm-opt over the binary to compress the encodings is also a workaround).
Turns out it's an unfortunate interaction where
wasm-bindgen
(Rust's JS shim generator) looks at the feature section of the module and now assumes that you want full reference types support and rewrites the wasm module to fully make use of them... at which point thewebassemblyjs
dependency ofwebpack
fails to handle it. I think here we can likely just fix this quickly inwasm-bindgen
in the short term and do a proper PR to implement the full proposal inwebassemblyjs
longer term.
This sounds like it's WAI with respect to the features section. The features section has always been intended (by me, at least) to encode the features supported by the target engine rather than the features actually used by the module, so wasm-bindgen
assuming it can use reference-types sounds fine.
A short-term workaround followed by a real fix in webassemblyjs
sounds like a good plan.
In LLVM 19 the
reference-types
feature is being enabled by default in https://github.com/llvm/llvm-project/pull/96584. This is having consequences in Rust - https://github.com/rust-lang/rust/issues/128475 - and is requiring documentation to be written - https://github.com/rust-lang/rust/pull/128511 - for how to disable features (docs that should be written anyway, but this update is becoming a strong forcing function).The main consequence of enabling
reference-types
is that the table index immediate in thecall_indirect
instruction is now being encoded as an overlong uleb which is 5 bytes long as80 80 80 80 00
. This overlong encoding of 0 has no semantic difference from whenreference-types
were disabled but validators and parsers which don't support reference types are rejecting these modules.I wanted to raise this issue here for awareness to confirm that this is expected fallout. I understand that enabling
refernece-types
by default is expected, but I wanted to additionally confirm that the consequences of actually emitting a breaking change into all modules usingcall_indirect
relative to parsers that don't supportreference-types
is expected. This seems like surprising behavior to me where modules that don't usereference-types
at all are now required to be executed in runtimes that support thereference-types
proposal.Or, alternatively, if there's an easy-ish way to shrink the leb encoding here (e.g. by assuming there's <= 127 tables and continuing to use a single byte) I think that'd personally be best to generate MVP modules as much as possible and only emit newer features when explicitly requested.
cc @sbc100, @aheejin, @tlively