WebAssembly / WASI

WebAssembly System Interface
Other
4.81k stars 249 forks source link

Take another step towards interface types #395

Closed alexcrichton closed 3 years ago

alexcrichton commented 3 years ago

This commit is another large refactoring on the tail of #391 which is intended to help march one step closer to using pure interface types for specifying WASI and using *.witx files. Contained here is a large amount of refactoring of the *.witx files, the internals of the witx crate, and how code generators are supposed to work.

At the *.witx level, some notable changes have been made:

All existing *.witx files in this repository have been updated for all phases with these new forms. To reiterate, though, no details of the ABI have changed (or at least not intentionally). Everything should still be ABI-compatible with before.

Under the hood for code generators this is a very large breaking change. The intention of this commit is to start moving code generators relying on witx into the direction that we'll end up with interface types. Namely the coretypes module is entirely replaced with a new abi module with the intention of handling lots more high-level details of translation than the previous module did.

The InterfaceFunc type now sports two new methods: call_wasm and call_interface. These two methods represent the two halves of lifting/lowering operations performed by code generators. The call_wasm function takes interface types values (or whatever they are represented as in the source language) and converts them to wasm types to call a wasm import. The wasm import's return values are then "deserialized" back into the language's interface types values too. The call_interface function is the opposite, it assumes that you're coming from wasm values and calling, e.g., a host function with interface values.

The abi module is refactored with an Instruction enum which lists out what are the current set of "adapter instructions". These adapter instructions are intended to somewhat model the eventual idea of adapter functions and their instructions, but for now they're pretty specific to WASI as-is today. All instructions currently model what WASI currently does, sometimes in very specific manners. It's expected that the Instruction type will be in flux as we tweak the ABI of WASI over time, but the current set of instructions will likely only be expanded because of maintaining compatibility with the current snapshot.

The expected usage of Instruction is that code generators will implement how to generate code for each Instruction. This should be a very low-level operation (generating code per instruction) and should be in theory quite easy to implement. Not all code generators need to implement all instructions depending on their use case. Additionally WASI as-is today doesn't always exercise all types of instructions.

The next steps after a PR like this include starting to define a new ABI for WASI which supports all types in all positions rather than the current ABI which supports only a limited subset of types in some positions. The intention is that this new ABI may add a new instruction or two but will generally not be a large breaking change for all existing code generators. Some new additions are expected but other than that existing code generators updated to use this PR will require little effort to support new ABIs.

alexcrichton commented 3 years ago

For a frame of reference, I've updated three different code generators for this refactoring of witx:

My hope is that by updating a number of locations this stresses this implementation in a few ways. It first ensures that the actual ABI of all functions has not changed. Additionally it enables running tests and diffs to ensure that generated code has not radically changed shape and still does roughly the same thing. Finally it tests both sides of calling and defining WASI APIs.

The PRs to all those repos do not exist just yet since the code is in a bit of a messy state. Additionally I'd like to perform another pass in this repository to document things before merging (it's just getting late here today). If desired though I can push up the work-in-progress state of each of the branches for reviewing if desired.

sbc100 commented 3 years ago

This seems great! Will it be possible to have the C headers be completely unchanged by this? (That would give a fairly high degree of confidence that the ABI didn't break).

pchickey commented 3 years ago

Last time we made this sort of change (https://github.com/WebAssembly/WASI/pull/220) we weren't able to keep the text of the headers the same but I used static assertions of size/align to show that the data layout was identical: https://github.com/WebAssembly/wasi-libc/pull/165

alexcrichton commented 3 years ago

Oh I'm not actually really that worried about this accidentally breaking the ABI we have today. I've changed how the header file is generated and the APIs it generates are indeed different, but that's intentional because shim functions are used to delegate to the real functions (and the real functions don't have a changed ABI). I mostly just wanted to emphasize that all the type-level and *.witx-level changes are not intended to be ABI-breaking, and if they are that's just a bug to fix.

I've also finished up documenting remaining TODO items and adding some more tests I wanted to be sure to get to. @pchickey would you be ok merging this and publishing this in this state? If you'd prefer I'm also happy to clean up and publish the changes to the code generators to review first.

It's worth mentioning that the code generators are seeing breaking changes in the code they generate, namely:

Note that in C's case the adapters are currently simple but it's expected that they'll get more complicated in the future. For example a next step would be to add debug-mode (or maybe always-on?) code which asserts that outgoing values are all valid. For example witx enums are represented in C as integers, but this means that invalid values could be passed to a consuming API. This error should ideally be caught in C rather than on the other side of the API to reflect the intended semantics of interface types.

pchickey commented 3 years ago

I'll take a look at the code generator changes today!

alexcrichton commented 3 years ago

For posterity, these are the three other changes for this PR:

sunfishcode commented 3 years ago

For completeness, POSIX generally requires EINVAL (or EFAULT) errors at least for things that are pointers at the POSIX level, however I think trapping is ok: on one hand the POSIX behavior can still be emulated by libc if we decide we really want it, and on the other, it seems unlikely that we'll want it. There isn't a lot of value in being able to pass an invalid pointer into the API, because presumably the application also wouldn't be able to dereference that pointer either.

alexcrichton commented 3 years ago

FWIW another motivation for switching to trapping is that it reflects the intended final semantics of interface types themselves where adapter functions would be doing reads/writes when passing values between modules, and the adapter function would trap given an invalid pointer and such.

Ok though I think this is in a good enough spot I'm going to merge with @pchickey's previous approval.

alexcrichton commented 3 years ago

Aha or not, I do not have the Green Button!

@pchickey would you be ok merging/publishing for me?

pchickey commented 3 years ago

published. @sunfishcode can you please add alex to committers here?