Compile-time intrinsic functions

WebAssembly / design

WebAssembly Design Documents

http://webassembly.org

Apache License 2.0

11.41k stars 695 forks source link

Compile-time intrinsic functions #1003

Open pipcet opened 7 years ago

pipcet commented 7 years ago

I'd like to propose compile-time intrinsic functions: functions that have both a well-known name and a function body: if the compiler knows about the intrinsic, it can use its own implementation; if it doesn't, it uses the function body.

This is somewhat similar to the stdlib functions passed to asm.js modules, but drops support for choosing deviating functions at run time; that way, the compiler can recognize and replace intrinsic function invocations relatively cheaply.

Since there is a fallback definition available, intrinsic functions could be added post-MVP without requiring any change to other wasm modules or compilers, providing full compatibility.

While intrinsic functions are called with call, that behaves just like an opcode; as intrinsic functions are placed at the beginning of the function index space, calling them will usually take only two bytes, which means code size isn't increased drastically compared to adding extra opcodes.

I believe this approach would significantly reduce pressure to introduce new opcodes, maintain compatibility, and improve performance. However, it's completely orthogonal to any opcode extensions and the mechanisms necessary for dealing with that, such as run-time code rewriting.

Applications would need to be discussed separately, but the ones I can think of are:

no-trap versions of float->int conversions and modulo/division ops (as discussed at https://github.com/WebAssembly/design/issues/986)
atomics (which would fall back to non-atomic definitions in the MVP)
reduced-precision math (such as fast reciprocals and reciprocal square roots on SSE architectures)
memory/string operations, particularly memcpy() (as discussed at https://github.com/WebAssembly/design/issues/236)

I've implemented some rough proof-of-concept code for SpiderMonkey/IonMonkey at https://github.com/pipcet/mozilla/commit/7f139a8902510dfa8df2268e58b040843a4d06d0, and drafted the necessary specification changes at https://github.com/pipcet/design/commit/e1fc29247b7038d3186d558b7737103fb812a4b2. In both cases, the changes are relatively limited: a new section for intrinsics, some index renumbering, and the code in emitCall that actually recognizes some intrinsics.

Since the benefit of this approach is compatibility, it would make a lot of sense to introduce it in the MVP, allowing MVP compilers to compile code that uses intrinsics defined and registered post-MVP.

AndrewScheidecker commented 7 years ago

See #362 for some past discussion of intrinsic calls vs operators.

lukewagner commented 7 years ago

IIUC, the primary value of this mechanism would be in making it easy/efficient to polyfill operators that were standardized but not yet universally deployed. In particular, I don't see how it reduces pressure for adding opcodes: the goal is still to add opcodes for all these polyfillable things.

It seems that the value would depend on how bad the window of delay was (between first-browser-shipping and last-browser-shipping). While in the old days this could be decades, wasm is only being deployed in evergreen browsers, so for the simple things that are polyfillable, the window might be less than a year. If that's right (I could be wildly overoptimistic), I'm not sure a whole polyfilling feature is necessary.

There's also a risk that I've seen happen in JS (String.prototype.contains, iirc) where people get whiff of a new feature, immediately start deploying the polyfill, the final semantics different from the early polyfill, and now deploying the real op breaks real-world code. :( Of course that can happen without this feature too, but the activation energy is much higher, so I'd expect it to be less likely.

jfbastien commented 7 years ago

I'm not sure I understand correctly: would this feature mean that different WebAssembly implementations exhibit different runtime behavior depending on whether they understand the intrinsic or not? We'd be relying on the polyfill being accurate?

pipcet commented 7 years ago

@lukewagner Very good points, I think.

I think compared to not having intrinsic functions at all (opcodes-for-everything), opcode pressure would be reduced: not every intrinsic would migrate to having an opcode, after all (nor is every potential new opcode a good candidate for an intrinsic).

Note that the actual cost of the polyfilling feature is very low—in fact, it seemed easier to implement to me than omitting it would have been; therefore, I'm not sure we should be looking at it as something we need to justify for every possible scenario. If it turns out to be unnecessary, we can simply provide an undefined opcode as "polyfill" for intrinsics that we don't want to bother shipping a polyfill for.

I don't share your optimism that the time difference between the release date of a wasm host environment and the compile date of a wasm module that we're attempting to run on it will never be more than a year; even if that's true on the Web, there are other scenarios for WebAssembly deployment.

I don't really see how the sad-smiley polyfill-differs-from-real-op scenario becomes more likely, to be honest: it would require people to wilfully start shipping non-prefixed intrinsics before they are properly registered, and I don't see a way for us to prevent that entirely.

@jfbastien Depending on the intrinsic in question, yes, runtime behavior (performance, at least) would differ. I think that's unavoidable with things like reduced-precision math, though it is a source of nondeterminism that needs to be documented.

For unknown intrinsics, we'd be relying on the polyfill being accurate, just as we always rely on the other module code "being accurate": we do what the module code says. If that's a problem, we have another chance to recognize the intrinsic and substitute either working polyfill or a working proper implementation, though, so this problem is reduced in severity compared to no-polyfill intrinsics.

pipcet commented 7 years ago

I'd missed the MVP announcement. I still think this is a good idea, but we should not renumber section ids for it; it would make most sense to have an out-of-order section for intrinsics between sections 1 and 2, maybe with section id 15 or 0x18.

One thing I'd missed (@lukewagner):

the goal is still to add opcodes for all these polyfillable things

I don't see why that should be a goal, and I don't think it's a feasible goal for some applications. I believe it makes perfect sense to have a comparably large number of intrinsic functions that are inlined when used, but only a relatively small number of opcodes.

lukewagner commented 7 years ago

Ah, I didn't see a mention of inlining above. Perhaps that's the more general underlying feature: inlining directives (on functions or callsites). The motivation here would be size, I think, since otherwise you'd just do the inlining directly at codegen time. However, this starts to overlap with what we've discussed as "layer 1" compression (at the top of BinaryEncoding.md) which would probably involve some sort of macro expansion.

pipcet commented 7 years ago

@lukewagner Hmm. I meant that native code would be produced by the call (not an FFI call), not (just) that the call to the polyfill would be considered for inlining.

So, yes, there's overlap, but my proposal cannot be implemented without layer 0 changes, unless I'm missing something.

I'd like to point out that registration of "intrinsics" would usually require a well-known name, a native implementation, a description of their intended operation in English, and an example polyfill implementation. That's one more opportunity to spot mismatches or undefined behavior, since the English description and the polyfill should obviously say the same thing.

A minimal wasm implementation (the wasm shell in binaryen, the interpreter in the spec repository, or the wasm-to-GNU-C code I hacked up at one point) wouldn't need to be modified for new intrinsics.

That leaves the question of how many intrinsics there would be, and whether we couldn't just assign opcodes to them all: I think there is currently a trend towards adding complicated CPU instructions: hardware crypto support, complicated atomics, trigonometric operations on GPUs, hardware string instructions, bitfield instructions, and so on. (I'm excluding SIMD, which I think requires both new opcodes and new intrinsics for best support). With intrinsics, we can ship wasm environments today that will work with future wasm modules, without relying on layer 1+ rewriting.

And it's a relatively small change: a function index corresponds to a signature plus one of:

a wasm definition
an import
a well-known name plus a wasm definition

Sorry if all of this is repetition; I'm fine with my idea being rejected, but I really would like to understand why.

lukewagner commented 7 years ago

I think the top-level question is: are we expecting a wasm engine to eventually have builtin knowledge of these intrinsic operations? Initially it sounded like 'yes' and now it's sounding like 'optionally'. We may end up with some permanently-optional features, but I think we should try to avoid this as much as we can to provide a consistent platform.

Also, I think we might be overestimating how many ops are polyfillable. E.g.:

Initial SIMD wouldn't be polyfillable b/c they need new vector types
Initial atomics wouldn't be polyfillable b/c they depend on having shared memory at all and shared memory would ship at the same time as the initial set of atomics
Anything control flow (e.g., exception handling), stack-manipulating, new kinds of imports/exports would be

In general, it seems like the polyfillable ops are just those that are refinements/iterations of an existing feature. But these seem like the type of ops that would be easily/quickly implementable/shippable after being specified so it's not clear that they need to be optional or take a long time to deploy, given modern evergreen browsers.

lukewagner commented 7 years ago

(To wit, I was proposing something similar to this a while back, so naturally I think it's a reasonable idea :wink:, I'm just sharing some of the reasons why I backed down from the idea over time.)

pipcet commented 7 years ago

We may end up with some permanently-optional features, but I think we should try to avoid this as much as we can to provide a consistent platform.

I think we'd end up with features implemented by every engine, features implemented by all high-performance engines, and perhaps the occasional rare feature that only makes sense for special CPUs/GPUs.

In the context of evergreen browsers, I think there'll be a standard set that people will use, plus the occasional new feature that one of the engines supports already and the others do not yet do. (Even with evergreen browsers, there's going to be a period of months until a new feature is available in every browser, and I'm not sure level 1+ mechanisms will really take care of that problem before it occurs).

So my guess would be that having an inconsistent platform is not a major risk.

Initial atomics wouldn't be polyfillable b/c they depend on having shared memory at all and shared memory would ship at the same time as the initial set of atomics

I disagree with that one, I think: in the absence of shared memory and exception/interrupt handling, an atomic operation is equivalent to the non-atomic operation. So we'd need the following statement in the spec for shared memory:

If the engine supports shared memory, it must also provide atomic implementations of the following intrinsics: ...

And then we can polyfill the non-atomic variants.

You're right, though, that after some thought the air is noticeably thinner for polyfillable things.