WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.42k stars 735 forks source link

Relaxing closed world validation and improving open world optimization #6965

Open tlively opened 7 hours ago

tlively commented 7 hours ago

The --closed-world flag lets us assume that we can make arbitrary changes to types as long as those types are not part of the module's contract with the outside world. Since the type system is structural, there is not a single, precise definition of what it means for a type to be "part of the module's contract," but we have chosen it to mean that we will keep the types of exported or imported module elements the same, but all other types are fair game. In particular, we assume we are allowed to modify subtypes of public types that are not themselves public. Otherwise a single anyref in an exported function would prevent us from modifying any struct or array type and a single funcref in an eported function would prevent us from modifying signatures of referenced functions.

However, our current closed-world validation is much stricter than this. It additionally restricts what types are allowed to be public. It allows the types of exported and imported functions to be public, and therefore must also allow all types in the rec groups of those function types to be public, but it does not allow any other defined heap types to be public, even if they are part of the type of an imported or exported function.

I believe the original motivation for these additional restrictions was that we wanted to be able to optimize as many types as possible, so we didn't want to allow users to expose types in a way that would inhibit optimizations. But this is putting the cart before the horse. We should be able to optimize any module we are given according to the assumptions configured via command line options, and there is no user benefit if we simply reject modules that they want to optimize because we cannot optimize it as well as some different module they could have given us. Users (such as Kotlin) are running into these errors when they try to use smaller rec groups in their input.

Here is the state of the world I would like to move to:

Here are the steps necessary to get to that state of the world:

@kripken, WDYT?

kripken commented 5 hours ago

Sounds good!

  1. Might be worth mentioning externref here. I assume an exported/imported externref is handled similarly to anyref?
  2. We want to still preserve the key property in closed world that one can send a reference out but the outside cannot inspect (for an array or struct) or call (for a function) that ref. That is, that the outside can cache the reference and send it back in, but not interact with it. Atm in closed world we achieve that by sending out anyref/externref, and not the specific GC type, but maybe there's a better way, e.g., sending out the specific GC type but annotating it as private. I don't feel strongly here.
tlively commented 3 hours ago
  • Might be worth mentioning externref here. I assume an exported/imported externref is handled similarly to anyref?

Yes, good point. Externrefs in the public interface should be treated as though they were also anyrefs and vice versa.

  • We want to still preserve the key property in closed world that one can send a reference out but the outside cannot inspect (for an array or struct) or call (for a function) that ref. That is, that the outside can cache the reference and send it back in, but not interact with it. Atm in closed world we achieve that by sending out anyref/externref, and not the specific GC type, but maybe there's a better way, e.g., sending out the specific GC type but annotating it as private. I don't feel strongly here.

I think that use case will have to continue using abstract heap types like anyref and externref on the boundary. If we allowed a defined type to be passed out directly, then even if we assume the environment will not access it directly, changing it would still change the type of the function that passes it out. That's fine in a JS embedder, but not in any statically typed embedder. If we want to allow this anyway, we could use the (@private) annotation and not make it an error to use (@private) types in the module interface.