WebAssembly / proposal-type-imports

Proposal for Type Imports & Exports
Other
21 stars 7 forks source link

A usecase for non-reference type imports: restricted primitive ranges #10

Open titzer opened 4 years ago

titzer commented 4 years ago

I stumbled across a use case for non-reference type imports. Reporting here for discussion.

It occurs in languages with richer primitive type systems than WebAssembly. For example, in Java, there are types byte (signed 8-bit integer), short (signed 16-bit integer), and char (unsigned 16-bit integer). While these all have natural mappings onto the value range of Wasm's i32, they cannot be distinguished in function signatures. What that means in practice is that a host language's overloaded APIs cannot be distinguished by the types of their arguments or results. Moreover, there is then no way for host APIs to communicate that they accept restricted-range primitives like byte or short

For a concrete example, suppose we want to import print(byte) and print(short) from a language that distinguishes those two primitives types. Then both of these functions map onto the wasm type [i32] -> [] and thus there is no checking at the Wasm level that values are in range. We have to fall back on a dynamic check.

If we allow type imports to have subtype constraints that are primitive types, then we can use/abuse this to define "restricted range" integers in the host and specify they are assignable to, e.g. i32. For example, the module can import the type short with subtype constraint i32, and then it has the ability to both accept and return values of short. Then that type can be used to distinguish overloads and uses Wasm's type system to enforce type safety. The constraint exposes the representation, so values also subsume to i32, where the module can do arithmetic. To get back to short, the module can import a function of type [i32] -> [short] which does a range check.

Submitting this for discussion and hashing out.

taralx commented 4 years ago

Is the intention that there is a subtyping relation short <: i32?

titzer commented 4 years ago

@taralx Right.

skuzmich commented 4 years ago

I assume Wasm expects toolchains to deal with 8 and 16-bit types.

What that means in practice is that a host language's overloaded APIs cannot be distinguished by the types of their arguments or results.

Overload resolution in languages like Java is done at compile time. Common practice is to import overloads with different mangled names.

... there is no checking at the Wasm level that values are in range. We have to fall back on a dynamic check.

Instead of dynamic in-range check VM could do truncation when preparing argument for a host function call. Source language compiler would have more types than Wasm and could statically prove that such truncation is a no-op.

titzer commented 4 years ago

Overload resolution in languages like Java is done at compile time. Common practice is to import overloads with different mangled names.

It can only be done at compile time if there is a whole-program analysis (i.e. linking). If we have separate compilation, indeed, if wasm is a full-fidelity container for language APIs, then these types should be expressed in signatures. Name mangling doesn't enforce signatures with a typechecking mechanism.

Source language compiler would have more types than Wasm and could statically prove that such truncation is a no-op.

I am considering the case where an API offered by one programming language might be used by a program (compiled to wasm) from another language, so neither source compiler has access to the entire program.

skuzmich commented 4 years ago

It can only be done at compile time if there is a whole-program analysis (i.e. linking).

I would argue that mangled names are used exactly to avoid that. Compilers resolve overloads during separate compilation and use mangled names to express this resolution result.

wasm is a full-fidelity container for language APIs

Do we want to have that? If so, we'd want our language needs to be included too, its all the API that JVM can express plus a lot of custom metadata on top of that :)

I assumed that Wasm should be good enough format to transfer, link and run programs. But it order to compile a source code against Wasm you would have to supplement it with language-specific description, like header file or a custom metadata section.

Name mangling doesn't enforce signatures with a typechecking mechanism.

This would maybe save you a truncation instruction on some CPU architectures for each byte or short parameter during host function call. Do you see other runtime benefits of typechecking this on Wasm level?

titzer commented 4 years ago

I would argue that mangled names are used exactly to avoid that. Compilers resolve overloads during separate compilation and use mangled names to express this resolution result.

Sure, but there is nothing that the engine needs to do to either support or preclude mangling, so it's likely that some languages will do it regardless of our choices here. My point is that this use case offers static enforcement of API invariants. Admittedly, the overloading argument is a bit weak.

wasm is a full-fidelity container for language APIs

Do we want to have that?

I am working on this problem currently. I am discovering that the type import mechanism is surprisingly good at this and needs only minor tweaks to vastly increase its expressive power. This is one such tweak.

This would maybe save you a truncation instruction on some CPU architectures for each byte or short parameter during host function call. Do you see other runtime benefits of typechecking this on Wasm level?

That's for restricted range integers. Another compelling use case is enums. In that case there might be an arbitrary subset of integers to be checked for, and that can be expensive. In the limit, any data type which can be encoded into bit patterns could be made opaque and passed across the boundary, so any dynamic well-formedness check can be avoided statically.

Although this is not primarily about performance, when we have engines inlining across the host boundary, then additional dynamic checks become relatively more expensive, because the boundary is cheaper, perhaps even zero-cost.