Pass small structs in parameters instead of memory

WebAssembly / tool-conventions

Conventions supporting interoperatibility between tools working with WebAssembly.

Artistic License 2.0

302 stars 67 forks source link

Pass small structs in parameters instead of memory #88

Open dschuff opened 5 years ago

dschuff commented 5 years ago

Currently clang will pass any structure with more than a single element indirectly (as LLVM byval). Many other ABIs pass small structures in registers, potentially improving performance. We could do the same thing, but choosing the upper limit on the size (and therefore the number of registers to use) is a little more interesting, since we don't know how many physical registers the underlying machine will have (let alone have available for passing arguments). But it's also true that linear memory references may be more expensive if there is bounds checking. It seems worthwhile to at least use 2 params since most calling conventions are likely to be able to handle that in most situations, and pairs are fairly common.

If we do this it will make the ABI a lot more complex to specify, since there will be classification rules about when this would apply (both AMD64 SysV and AAPCS are fairly complex in this regard). But it seems worth it. We would probably need slightly different rules for promotion than those architectures since we have both 32-bit and 64-bit argument types.

I'll think a little more about the details, but does anyone have strong opinions about whether this is likely to be a good idea?

tlively commented 5 years ago

I think it's a good idea in general, but perhaps we can simplify things by punting on it until we specify an ABI with full multivalue support.

alexcrichton commented 5 years ago

This tangentially came up here when @tlively landed an upgrade for the Rust emscripten target to use the LLVM wasm backend instead of the old fastcomp backend. Rust's naive encoding of ABIs on the non-emscripten wasm32-unknown-unknown target did not attempt to force aggregate structures to be passed-by-pointer which meant that prior to rust-lang/rust#63649 passing something like struct Foo(usize, usize) would show up as two i32 parameters in the wasm file.

We ended up avoiding changing the ABI for the Rust wasm32-unknown-unknown target in rust-lang/rust#63649 because projects like wasm-bindgen relied on the fact that fn foo(a: Foo) would take two i32 parameters (since it's generating JS glue that expects this) and we didn't want to break it immediately. The Rust wasm32-unknown-unknown target, however, differs from Clang in real and detectable ways so we still want to get this sorted out!

I wanted to comment here to basically say that I personally think this would be the right way forward. This matches the behavior of ABIs on other systems (C/Linux example for reference) and generally feels like it jives much better with wasm as a target since there's not really any limit to params/results and by passing aggregates through a return pointer it feels like it's making a decision best left up to the actual compiler consuming the wasm.

I'd be remiss if I didn't also mention factor that updating wasm-bindgen will be a pretty significant investment to get this working as well. We have a possible workaround but it's a breaking change for wasm-bindgen which is pretty significant. This, coupled with the fact that I personally think that "exploding" aggregates it the right strategy, motivates me to try to pursue this.

In terms of specification, would something like this be that complicated with respect to existing ABIs? I would imagine that the specification would be something like "if an aggregate is passed by value then each of its components is passed by value as a separate argument, recursively" or something like that. I would suspect that native ABIs have to do all sorts of weirdness for limits of registers and such, but wasm has "infinite registers" so it seems like it would be less of an issue.

sbc100 commented 5 years ago

I think also need to figure out how we do ABI versioning in the object format before we can do something like this. I guess we could bump the version of linking metadata section which would make old and new object incompatible. Or we could do it for real and fix #54.

tlively commented 5 years ago

I am currently working on toolchain multivalue support, and I am using Rust's test suite to measure my progress since clang does not yet produce IR that returns structs by value. Once multivalue is stable in the tools, I plan to conduct science on a new C ABI. Not only is multivalue return on the table, but also other arbitrary ABI changes like passing structs in registers.

Since I expect a new ABI to depend on multivalue, it would make sense to use the new ABI as the default C ABI if and only if multivalue is supported. When multivalue is supported, the old C API should be available in an opt-in basis for compatibility with MVP library binaries. In the LLVM ecosystem both target features and ABIs are properties of individual functions, so a single object file could use both the new and old ABIs. An object-wide ABI version or target feature policy controlling the ABI therefore doesn't make much sense, but per-object ABI metadata might be useful for producing more helpful linker errors. Alternatively, we could construct the new ABI such that ABI mismatches always produce type mismatches that the linker already knows how to report.

According to that plan, we would keep the current ABI around as long as we cared about MVP compatibility, which is potentially forever. @alexcrichton does this sound reasonable to you?

alexcrichton commented 5 years ago

I think it definitely makes sense to reconsider the default ABI with the multi-value feature enabled, yeah. I would indeed want to make sure that this change (passing small structs via "registers") would make it in as part of the ABI for that.

As to what to do about the current ABI, I think it sort of depends on how quickly multi-value is adopted in engines. If it shows up quickly I think most wasm producers could reasonably default to multi-value, but we've still got a potentially year-long transitionary period in the meantime.

Basically the crux of the issue was that the ABI that wasm-bindgen wants disagrees with the already-there ABI in Clang, and that's something we want to fix. Long-term with multi-value it's highly likely to be fixed, but we still have a potentially long transitionary period where the ABIs would disagree.

I don't want the difficulty of work in wasm-bindgen to hold anything back, but I also don't want "whatever clang happened to implement" to cause a bunch of work for everyone else. In the meantime the use case for mixing raw clang code (not emscripten, just bare clang) and Rust is relatively limited and doesn't come up all that often. The major target for that, wasi, is probably one where we should update rustc to match clang's ABI today since there's not a preexisting tool like wasm-bindgen which targets that.

sbc100 commented 5 years ago

I don't know if calling the current ABI "whatever clang happened to implement" is fair. IIUC the current ABI was indeed developed alongside the clang/llvm work but the decision making process there was far from arbitrary or ad-hoc. I'd like to see this change happen one day, but it is a breaking change and I imagine non-trivial to change in llvm(?). I'm curious what @sunfishcode thinks? Is this worth doing now or delaying until other changes such as mutli-value require an ABI change anyway?

Our of interest, why do clang and wasm-bindgen need to agree here? Or rather, why to C and rust need to agree? Whats is the benefit we get? Isn't there already rust<->C ABI FFI layer in rust that would allow for this? Or would this change just allow for such as layer to be more efficient?

alexcrichton commented 5 years ago

Ah sorry, I didn't mean to mischaracterize! I wasn't aware of the process of creating the current ABI.

The need for interop is from when you're liking Rust/C into the same wasm blob. The Rust signature will declare what C does (and vice versa) and for it to work right we'd need those two encoded the same way. This basically is the FFI layer of how Rust/C communicate which is why the problem comes up. The nitty-gritty is that @tlively's recently added wasm32.rs defines the ABI for emscripten (notably making all non-trivial aggregates pass-by-reference), whereas the now-older wasm32_bindgen_compat.rs simply leaves structs by-value, and LLVM apparently interprets by-value structs as "pass everything in registers".

The differences there mean that when mixing C/Rust inside of wasm-bindgen a C file that does:

struct Foo { int a; int b; };

void foo(Foo foo) { /* ... */ }

is incompatible with the analagous Rust import:

#[repr(C)]
struct Foo { a: i32, b: i32 }

extern "C" {
    fn foo(foo: Foo);
}

(because Rust passes in registers but C expects a pointer)

joshtriplett commented 4 years ago

So, I do care about the ability to mix C and Rust in wasm, using "bare" wasm rather than wasi.

If this is an ABI incompatibility, would it potentially make sense to incorporate ABI versioning into the target name? We could have a wasm32-abiv2 target that defines the new ABI, and when we move to a multi-value ABI we could call that wasm32-abiv3. That would allow us to move to a clang-compatible ABI without breaking wasm-bindgen, and would make it clear which ABI or ABIs a tool supports.

(The downside would be teaching other tools about those ABI versions.)

tlively commented 4 years ago

... using "bare" wasm rather than wasi.

Currently the only wasm target using the correct C ABI is wasm32-unknown-emscripten. WASI uses the wasm-bindgen-compatible C ABI.

I haven't experimented with this or had enough design discussions about it yet, but here's my current thinking about how we should introduce a new ABI. I don't want end users to have to think about ABIs in the common case, so I don't want to introduce mandatory ABI specifiers in the target strings. Optional ABI specifiers would be fine. A key feature of the new ABI will be that it requires multivalue, so I was thinking a reasonable default would be to use the new ABI when multivalue is enabled and the old ABI when multivalue is not enabled.

We can also play games in the target features section in object files to have the linker emit a helpful error if you try to link objects with incompatible ABIs accidentally.

This is actually going to be much harder in C/C++ than in Rust because Rust can store ABI in its unstable interface metadata, allowing seamless calls between old- and new-ABI functions. But C/C++ will need a way to inject that same ABI information into header files, possibly with a pragma or something.

alexcrichton commented 4 years ago

I'd also want to clarify that I would hate for wasm-bindgen to be the reason that we invent a scheme to have users pick an ABI, I agree with @tlively that a choice of an ABI should largely be forgotten by developers and something they don't have to worry about.

This issue is the only concrete reason why we haven't switched Rust's wasm32-unknown-unknown ABI (AFAIK). The wasm-bindgen tool relies that struct Foo(u32, u32) in Rust is passed as two parameters, not by-pointer. There is a proposed solution to this which would be a very large change, and possibly breaking, for wasm-bindgen, and I don't personally have time to implement it right now.

Another possible solution might be to add a -f or -m flag of some kind to clang for compiling C/C++ code that specifically wants to interoperate with the Rust target? We could add this in Rust's side to the cc crate so users could largely not have to worry about it still.

tlively commented 4 years ago

I'd also want to clarify that I would hate for wasm-bindgen to be the reason that we invent a scheme to have users pick an ABI

Agree. Thankfully we were thinking about how to introduce a new ABI independently from the wasm-bindgen issue. The wasm-bindgen ABI mismatch certainly makes the ABI work more important and useful, but doesn't change the design goals AFAICT.

Another possible solution might be to add a -f or -m flag of some kind to clang for compiling C/C++ code that specifically wants to interoperate with the Rust target? We could add this in Rust's side to the cc crate so users could largely not have to worry about it still.

I appreciate the convenience of this solution, but I don't think this is something we would want to upstream. I would like to avoid having more C ABIs than absolutely necessary, and I want the ABIs we do have to be deliberately designed for performance and code size rather than added for convenience.