LLVM 19 and enabling `reference-types` by default

alexcrichton commented 4 weeks ago

In LLVM 19 the reference-types feature is being enabled by default in https://github.com/llvm/llvm-project/pull/96584. This is having consequences in Rust - https://github.com/rust-lang/rust/issues/128475 - and is requiring documentation to be written - https://github.com/rust-lang/rust/pull/128511 - for how to disable features (docs that should be written anyway, but this update is becoming a strong forcing function).

The main consequence of enabling reference-types is that the table index immediate in the call_indirect instruction is now being encoded as an overlong uleb which is 5 bytes long as 80 80 80 80 00. This overlong encoding of 0 has no semantic difference from when reference-types were disabled but validators and parsers which don't support reference types are rejecting these modules.

I wanted to raise this issue here for awareness to confirm that this is expected fallout. I understand that enabling refernece-types by default is expected, but I wanted to additionally confirm that the consequences of actually emitting a breaking change into all modules using call_indirect relative to parsers that don't support reference-types is expected. This seems like surprising behavior to me where modules that don't use reference-types at all are now required to be executed in runtimes that support the reference-types proposal.

Or, alternatively, if there's an easy-ish way to shrink the leb encoding here (e.g. by assuming there's <= 127 tables and continuing to use a single byte) I think that'd personally be best to generate MVP modules as much as possible and only emit newer features when explicitly requested.

cc @sbc100, @aheejin, @tlively

sbc100 commented 4 weeks ago

Yes we were aware of this slightly strange fallout of enabled reference-types.

For example we had (within Google) reports from folks using wamr that led to https://github.com/bytecodealliance/wasm-micro-runtime/pull/3726.

Its a rather odd side effect of including multi-table as part of reference-types.

Using a single byte relocation for the call_indirect immediate would be a fairly big change since it would be the first relocation that was anything but 5 bytes. Are you sure its worth this effort? I don't think the goal of llvm or wasm-ld should be to produce MVP output unless --cpu=mvp is used.

Isn't this just a one time cost for all tools to update their defaults to match llvm? At least for binaryen and emscripten we try to keep all our defaults in line with llvm's defaults so everyone is on the same page. I imagine each time llvm changes its defaults there will be such one-time costs for all downstream tools, no?

sbc100 commented 4 weeks ago

Also, just in case you were not aware, the linker has --compress-relocations flag that can be used to compress all these 5-byte LEBS.

Also, any optimized build that goes through a tool like wasm-opt will also no longer have these non-canonical LEBs. Just in case that helps anyone who might come accross this issue.

dschuff commented 4 weeks ago

+cc @sunfishcode Yeah, generally speaking I'm sympathetic to this... for example when this change landed we had to do an update in our CI because the version of node we were using didn't support retypes out of the box. I think it's actually worth separating the question of enabling features by default generally from reference types specifically.

For features generally, I don't actually agree that we want to only generate MVP modules unless newer features are requested. Reference types is maybe a special case (since most C/Rust style code doesn't benefit from it) but a better example is nontrapping float-to-int conversion. Enabling this by default will make basically nontrivial program (at least C programs, not sure about Rust) a bit smaller and maybe faster, and there's really no downside to enabling it. Given that many users will not want to go through the list of all features and try to reason out which ones they explicitly enable (and therefore many will just use the default plus whatever is actually necessary) I think we should, as a general practice, enable most features by default once support has reached some threshold. Exactly what the criteria should be (for support levels in the wild, benefits of the feature, etc) could be debatable, but I think the "default" practice should be to enable things unless there's a reason not to.

Reference types specifically could maybe be a special case. It's true that C/Rust style code normally doesn't need it. You could imagine a solution like the one you mentioned; this seems sort of analogous to the small vs large code models supported on some architectures, where there are different size limitations on the resulting code in exchange for more efficient code generation in some cases. 127 tables does seem like enough for most purposes. Probably we just didn't think about this when we made the original design (maybe @pmatos remembers?)

tlively commented 4 weeks ago

FWIW, I don't think we expected the call_indirect issue before we made the change, but I don't think we would have done much differently if we had anticipated it (except perhaps communicate more broadly ahead of the change). We certainly would have wanted to make the change eventually for the reasons @dschuff gave, and reference types have been standard for long enough now that I don't think waiting longer would have materially changed the outcome.

I'm nervous about the idea of capping the number of supported tables at 127. That's certainly sufficient today, but we've had all kinds of ideas about using separate tables for separate function signatures for various purposes and I wouldn't want to create a new problem for those ideas to run into in the future.

dschuff commented 4 weeks ago

Also it's possible that we may eventually want to enable exception handling by default for C++ (or even for Rust panics?), and that will require reference types.

alexcrichton commented 4 weeks ago

That's a good point @dschuff and I should clarify that I do not want to shackle LLVM or anyone else to the wasm MVP, I think it's very good to enable new proposals by default as things move forward. I can perhaps clarify my stance though in that if a selected operation is MVP compatible I think it's beneficial to have it encoded in a way that's MVP compatible. For example a nontrapping float-to-int conversion is not MVP compatible so there's nothing that can be done about that. For call_indirect in this case though there is something that can be done which is to encode the leb differently. Put another way I do think it's a nice property that the "AST of a wasm module" (or fragment) is always encoded in the oldest supported encoding for better compatibility.

Isn't this just a one time cost for all tools to update their defaults to match llvm?

The main exception to this I can think of is build configuration. We support building Wasmtime without a gc feature turned on (a Rust-level Cargo feature, not to be confused with the wasm feature name) where if you disable Wasmtime's gc comiple-time Cargo feature it compiles out support for reference-types, function-references, and gc. This is done to create more minially-sized builds of Wasmtime. Previously that would accept modules from LLVM compiled from C/Rust but that no longer works due to this issue. The reference-types proposal must be enabled to accept these modules which means it must be enabled at compile time, bringing in more runtime dependencies.

Now this itself can be split into a number of different possible solutions:

One solution is encoding the leb differently (I'll say more on this below)
One possibility is to "invent" a wasm proposal which is just multi-table and not externref from reference-types. Then Wasmtime in non-gc mode would still reject modules with externref but would support modules from LLVM with overlong lebs in call_indirect
Wasmtime could be updated in non-gc mode to itself always enable reference-types-the-proposal but reject instances of externref.

None of these seem like the obvious solution per-se, but I also suspect that Wasmtime isn't the only runtime that may want to indefinitely provide a build mode where GC and/or externref isn't supported.

For the leb/relocation limits, my rough idea is that a new 1-byte relocation would be added for table indices. I realize that would be quite different from what's implemented today, though. I forgot though that LLD supported --compress-relocations. If that's supported then does that mean LLD could support growing relocations as well? For example could a relocation start as a single byte but if needed LLD could grow it up to 5 bytes? That would solve the issue of not actually limiting to 127 tables while still benefiting the most common use case of there's only a single table.

Also it's possible that we may eventually want to enable exception handling by default for C++ (or even for Rust panics?), and that will require reference types.

Good point! (and agreed that Rust will want to enable it by default one day too)

IIRC though the exceptions proposal doesn't require at least the GC/externref part of reference-types, so that sort of also points towards the split I was mentioning above of supporting a mode in Wasmtime of reference-types-but-no-externref or something like that.

fitzgen commented 4 weeks ago

I agree with the sentiment that this is a specific case, and shouldn't generalize to every time any proposal is enabled by default in LLVM.

I think what makes this case more complicated is that the user's source and LLVM's emitted modules aren't actually leveraging any new features unlocked by the newly-default-enabled proposal. The only results in this particular case are worse code size and less portability.

If LLVM were taking advantage of newly-unlocked features and using externrefs in the emitted module, the calculus would be very different. (For example, enabling C++ exceptions and Rust unwinding by default; but also I'd also expect that if I built a plain C program that didn't use setjmp/longjmp then I wouldn't have anything in the wasm binary that depended on exceptions).

Similarly, I would expect that in the far future when the custom-page-sizes proposal has been finished and merged into the core spec, wasm-ld wouldn't add the binary-encoding equivalent of (pagesize 64KiB) to every module it links together that uses the default page size. I'd expect it to simply rely on the defaults when possible, and only emit the annotation when a non-default page size is used.

sbc100 commented 3 weeks ago

wasm-ld wouldn't add the binary-encoding equivalent of (pagesize 64KiB) to every module it links together that uses the default page size

I agree with this, but in this case it is not the linker that is encoding the call_indirect immediate but the compiler. When reference types are enabled the compiler doesn't know if some other translation unit will add additional tables so it has to do the conservative thing and add a relocation. All the linker does is apply the relocations that the compiler generates.

sbc100 commented 3 weeks ago

I don't think the code size argument applies here since optimized binary should already be running though something like wasm-opt or built with --compress-relocations. Binaries that don't do this are already bloated with non-canonical LEBs.

sbc100 commented 3 weeks ago

I think adding a new single-byte relocation would be easier than trying to do something like compress-relocations but in reverse. --compress-relocations breaks debug info, so it only works on release builds.

The single-byte relocation type might not be so bad. If anyone tries to use more than 127 tables the linker could emit a meaningful error such as "obj.o does not support more than 127 tables. Please recompile with with -msomething`.

We could also use the same default for multi-memory where the number of memory relocations will likely be larger by an order of magnitude and the number of memories likely always very small.

In order words, I'm coming around to the single-byte relocation idea..

alexcrichton commented 3 weeks ago

Oh I didn't realize --compress-relocations would break debuginfo. That's a strong reason to not switch everything to 1-byte-but-can-expand relocations. I think it would be reasonable to have a -msomething flag for "use a bigger relocation here" and I think that could perhaps be turned on by default for a hypothetical future with a table-per-function-type too.

sbc100 commented 3 weeks ago

I assume the idea would be to backport this change to the LLVM 19 branch?

alexcrichton commented 3 weeks ago

If it's reasonable enough to do so that'd be great, but if it's large enough and/or risky enough I don't think it's the end of the world to update this in LLVM 20 instead

tlively commented 3 weeks ago

In general, it would also be a good idea for engines to support encoding changes introduced by new proposals, even if they are configured not to support the new features and semantics of those proposals.

sbc100 commented 3 weeks ago

In general, it would also be a good idea for engines to support encoding changes introduced by new proposals, even if they are configured not to support the new features and semantics of those proposals.

I was wondering about that.. this would be an nice solution, but can you explain why you think its a good idea?

I suppose this would also involve updating the core spec tests to match pending/new proposals in these respects?

tlively commented 3 weeks ago

To be clear, I'm only talking about proposals that have become part of the standard. The core spec tests need to be updated for such proposals anyway. Accepting new encodings even without accepting new features 1) prevents problems like this one, and 2) simplifies decoders that no longer need to have different behavior in different configurations.

sbc100 commented 3 weeks ago

Well updating wasmtime to unconditionally accept LEB encoding for this immediate certainly seems like the simplest solution then.

Currently both wabt and wamr gate the encoding change on reftypes being enabled, but if it makes since to make this unconditional that is cleaner and more compatible.

fitzgen commented 3 weeks ago

IIRC, there have been assert_malformed tests in the main test suite that will fail if you made this sort of change because a proposal widened what encodings were acceptable.

If we wait to make this sort of change until a proposal is merged into the main spec, that is one way to work around the issue.

Or we could adopt a policy in the CG to avoid writing such tests and change/fix them where they do exist.

sbc100 commented 3 weeks ago

I doubt we would ever consider turning on a feature by default in LLVM before it was fully merged in the main spec repo. I think we can expect a substantial lag between stage 4 approval and LLVM enabling a feature by default.

alexcrichton commented 3 weeks ago

What Nick brings up though isn't the only problem with newer proposals. One issue is that while a proposal is being developed, as he mentioned, there are often contradictory tests where the main test suite says "this must be invalid" where a proposal then says "those same bytes must be valid". This means that at the binary decoding layer we have to have some sort of conditional, so that's how proposals are all developed and everything starts out as these sort of conditional branches in the decoder.

Later on after the feature has been merged into the spec then the question is whether the feature flag should be removed. So far Wasmtime/wasm-tools haven't removed any feature flags and support disabling features to go back to the MVP for example. This is not necessarily generally useful to everyone but certain embedders have expressed a desire to be able to do this indefinitely. For example some embedders want to be able to always disable the simd proposal. Wasmtime's build configuration requires being able to disable at least the extenref type in the reference-types proposal. I vaguely recall there being other motivations along the way but I can't find their links at this time.

Basically, at least for Wasmtime/wasm-tools, it so far hasn't been a safe assumption that phase 4+ proposals are unconditionally enabled and can't be disabled. If that were the case this would be less of an issue for Wasmtime, but there's still motivation to turn off different features of WebAssembly.

Another possible very-abstract solution would be the "profiles" option for wasm where the GC profile for wasm would probably include externref but probably wouldn't toggle the multi-table aspect (and thus the leb-encoded index in call_indirect). That would be a more natural target for Wasmtime embedders disabling gc, and then users who specifically want to emulate older versions of the specification might be required to use older versions of wasm-tools or understand that some binary encodings might be accepted when they were historically rejected. That's all sort of hand-wavy though and I realize that profiles are not exactly just around the corner.

tlively commented 3 weeks ago

I'm not suggesting that feature flags be removed, but only that the decoder (and no other part of the codebase) behave as though standardized features are always enabled. It's a good point that the decoder would still need feature flags to avoid breaking spec tests while supporting proposals that are not yet standard, though. In that case, I would be happy to update the upstream spec tests to remove the contradiction. Alternatively, the feature flag guards in the decoder could be removed once the proposal reaches phase 4.

alexcrichton commented 3 weeks ago

In wasm-tools it actually used to do that but was relatively recently modified. If spec tests were updated so they never contradicted between proposals I agree that would remove the need for such a change though. That being said I might have less confidence than you that such a change to how spec tests are written would be widely agreed upon, but I'd definitely be willing to voice my opinion in favor of such a change!

alexcrichton commented 3 weeks ago

If it helps, these are the issues that arise when binary decoding assumes all features are active:

https://github.com/WebAssembly/spec/blob/5f7ae488bba28c544dd88769df60b133a2a211c3/test/core/global.wast#L378 - asserts "malformed mutability" but this is now a valid module with shared-everything-threads so it has a different error message
- same on line 408
https://github.com/WebAssembly/spec/blob/5f7ae488bba28c544dd88769df60b133a2a211c3/test/core/global.wast#L391 asserts "malformed mutability" but with shared-everything-threads it's now "malformed global flags"
- same on line 420

This issue is then replicated across the extended-const, function-references, gc, and memory64 repositories which have the same test in their globals.wast.

I was actually surprised that this was the only issue. Historically multi-memory and reference-types caused a lot of similar issues. The reference-types proposal has long since merged, but I would have thought that multi-memory would still be causing issues, but apparently not!

tlively commented 3 weeks ago

FWIW, we're about to start importing the upstream spec tests into Binaryen as a submodule, so we'll start running into the same issues.

Another way you can minimize pain here is by ignoring the error message text in the assertions; that text matches the errors from the spec interpreter, but there's no requirement that other engines produce the same error messages.

dschuff commented 3 weeks ago

Yeah, the encoding thing is an interesting problem. Generally speaking I do like the idea of engines expanding the allowable encodings even if a feature itself is disabled. It doesn't necessarily solve all the problems (e.g. there could be engines that just don't support a new feature at all; tools might still want a way to disable use of a feature), but would solve a class of issues once a feature becomes standard). And for a tool like wabt or wasm-tools, there is pretty limited value in emulating a mode where a feature is unsupported (compared to the value of just letting a decoder decode all the features it knows about).

WRT how it's speced, part of this problem does go away when an extension gets merged into the spec, because the spec assumes that both the new encoding and the new functionality will be unconditionally valid, and currently doesn't have profiles or a similar way to reason about features being enabled or disabled. So IIUC there couldn't be an assert_malformed test to guarantee that a 5-byte encoding for a table in a call_indirect instruction in the current spec because ref types are just unconditionally "enabled". So it falls to tools and embedders to figure out what to do. I don't necessarily expect profiles to solve this entirely either. Some in the CG have strong opinions that there should be very few profiles (fewer than the variety of embedders in the world will inevitably want, I suspect). So I suspect that there will be feature combinations that will exist outside of the specified set of profiles, and it will just fall to us (with our software-maintainer hats on) to decide what to do just as it does today. And there may always be case-by-case issues. But this seems a pretty reasonable principle at least.

rossberg commented 3 weeks ago

Just wanna note that the spec requires all implementations to support this, and has for years now, full stop.

Either we call out implementations that don't follow this as buggy and make it their problem. Or we don't, but then the only way of doing so would be by officially introducing suitable profiles.

aheejin commented 3 weeks ago

If including reference-types feature in the build increases code size in wasm-tools, I agree w/ @dschuff that it can be a good idea that wasm-tools unconditionally decodes the call_indirect index as a LEB. Can you check the feature flags to see if reference-types is enabled without including reference-types feature in the build? If that condition check costs time, I think there is little to lose if you just decode it as LEB unconditionally, even in MVP. The downside of that would be we can't signal a validation error when an MVP module incorrectly encodes the index as an LEB, but... What would the odds of that be? Also, as @sbc100 said, this is not hurting code size; every LEB is encoded as 5 bytes initially and you need to run some optimization to compress that in release mode anyway.

I'm little reserved about the idea of making the relocation 1-byte from LLVM side and limit it to 127 tables based on the similar reasons @tlively suggested. Also as @rossberg said, eventually, the engines need to support this at some point. I don't think we are going to enable the new EH proposal by default in LLVM soon; given that it was standardized very recently we would likely wait for at least a few years before doing that. But it is possible that Rust or other toolchains would like to opt in to the feature before that because it gives better code size and performance, in which case the engines needs to support the LEB index anyway.

SingleAccretion commented 2 weeks ago

Doesn't the linker already support mixing object files without relocations for call_indirect and those with them?

In other words, what would be the cost of hard-coding the indirect function table to have index 0 always, in both the linker and the compiler?

Edit: fwiw, this would be a real size regression in the default configuration (which enables debug info) in our toolchain.

sbc100 commented 1 week ago

Doesn't the linker already support mixing object files without relocations for call_indirect and those with them?

In other words, what would be the cost of hard-coding the indirect function table to have index 0 always, in both the linker and the compiler?

Edit: fwiw, this would be a real size regression in the default configuration (which enables debug info) in our toolchain.

I had hoped this would be possible, but sadly the indirect function table index cannot be assumed to be at index zero. The reason is that imported tables always come before defined tables in the index space. So any program that imports a table for any kind will cause the (defined) function index table to have a non-zero index itself.

In other words, you can't both import a table and have a defined table at index zero.

SingleAccretion commented 1 week ago

The reason is that imported tables always come before defined tables in the index space

Hmm, of course (obvious in retrospect :)).

This also means that all object files built without reference types enabled (or ~= without table index relocations) are incompatible with links that have them enabled. It seems like quite a big breaking change, ameliorated perhaps somewhat by the rarity of imported tables.

WebAssembly / tool-conventions

LLVM 19 and enabling `reference-types` by default #233