My understanding is that @tlively and Google folks are currently experimenting with full-fat thread-local globals. In response to concerns about the implementation feasibility of this approach, @lukewagner and @eqrion came up with an alternative design for "context locals". After some futher discussion on what these should look like, this issue is an attempt to present a refreshed design for the context locals feature incorporating the iterations that happened in those discussions (e.g. https://github.com/WebAssembly/shared-everything-threads/issues/42). The sketch below assumes the "no capturing on suspension" variant, and is agnostic as to whether we have separate shared-suspendable and shared-nonsuspendable.

Background

To support useful compilation to shared functions, we need a mechanism for thread-local storage (to accurately compile source-level TLS), and a mechanism for JS interaction (since JS functions are nonshared, they can't be imported and called in the normal way inside shared functions).

Wasm-level thread-local globals solve the former problem, but require ambitious schemes for initialisation and garbage-collection. JS-API thread-local functions solve the latter problem (and can be used to simulate thread-local storage with a "get_thread_id" function), but pose similar garbage collection issues.

Context locals

Context locals aim to provide a basic mechanism for solving both problems. Conceptually, context locals represent storage that is local to the current Wasm call stack. This is resonant with the way engines already fix a "current instance" when entering a Wasm call stack. If a Wasm call stack is suspended and resumed elsewhere (including in another thread), the context locals at the suspension point are not captured - instead the resumed continuation inherits the context locals of the resumption point (with a type check to ensure the shape matches).

These qualities mean that it is safe to put JS functions into context locals, and call them even from shared code. Context locals can also be used to implement thread-local storage, although some additional care must be taken when crossing JS boundaries or making use of shared continuations.

Brief instruction set sketch

Extend function types with a new kind of local declaration - a sequence of types representing that function's "context" (this could also be declared tag-style in an earlier section as in https://github.com/WebAssembly/shared-everything-threads/issues/42).

e.g.

(func $foo (param i32)
  (local i32) (context (ref $t1) (ref $t2))
)

This function declares a context of type (ref $t1) (ref $t2), made up of two context locals.

For simplicity's sake, we'll assume a separate instruction set for interacting with context locals rather than using the existing local... instructions, but a combined scheme may be possible. These will be::

context.get
context.set
context.call (like get + ref.call, not strictly necessary but useful in some circumstances [see below])

which work as expected. One note - in shared-suspendable functions we'll still need something like the shared-barrier mechanism to allow nonshared results of context.get etc to be manipulated, but context.call would in principle be permissible even without such a barrier.

We also need a block instruction for switching to a new context - (context.switch t* ... end), or alternatively a call instruction that simultaneously switches contexts. This allows functions with mismatching contexts to call each other, and the cost of switching contexts is explicitly represented. A function which declares a context can only be called if its declared context is a subtype of (or for MVP, equivalent to?) the current context. Functions which do not declare a context can still be called from any other function. A context.cast block or call instruction for recovering a context subtype at runtime could be considered, but this would require contexts to preserve RTT information, which is an additional overhead.

JS-API

When a function has a declared context, the context must be bound before a function can be called. Functions with contexts, when exported to JS, have an extra context_bind (bikeshed name) method to accomplish this, which takes the values to be bound as the function's context, and returns a Wasm function that appears to have no context. Shared functions with unbound contexts can be postMessage'd, but the context_bind method on such a function either always returns an unshared function, or alternatively only returns a shared function if all the context parameters are shareable. The intent is that if the context contains any JS function or object, it should be rebound separately in each Worker that wants to call the function.

The reasoning for this separate bind step is to facilitate the compile-time specialisation that V8 has indicated they want to lean heavily on for performance. Due to lazy compilation, when a bound function is called for the first time, relevant context.call instructions can be specialised to the known value of the provided JS function. Since this code is only entered through the bound function, deopt checks are only necessary at boundaries where the context may change (e.g. initial JS entrypoints, and context.switch instructions). Pleasantly, no deopt checks are needed when repeatedly calling an already bound function - only when attempting to call the same function in another instantiation/binding.

The idea is that 99% of the time (including in situations with JS->Wasm re-entrancy) you're just calling already-bound Wasm functions.

Implementation sketch

EDIT: These proposed implementations are not correct, due to issues if the instance is shared across threads. Reader beware!

Here are two possible approaches - the space for the context is allocated inline with the instance, or the context is a separate allocation referenced by the current instance.

inline

When compiling the module and allocating the instance, find the largest context declared across all functions of the module and allocate that much extra space in the instance. When entering a context (e.g. through a call or resumption), copy the relevant values into this space (guaranteed to be enough space for every possible context). This has the advantage of making context accesses fast, but the context locals must be recopied when there is a cross-instance call (although this can be a wholesale memcpy rather than a per-member iteration).

separate allocation

Each instance has space only for a reference to the current context, which is a separate allocation. This has the advantage of not requiring a copy upon cross-instance call, but adds indirections to context access.

Example

(sorry if the syntax is minorly wrong or otherwise undercooked)

console.log

Wasm
(module $module...
  (export $foo "foo")

  (func $foo shared-nonsuspendable (param i32)
    (local i32) (context (externref) (ref func [externref]->[]))
    (context.get 0)
    (context.call 1)
  )
)

JS
inst = WebAssembly.instantiate($module);

inst.foo(0); // not allowed, but can postMessage inst.foo
foo_bound = inst.foo.context_bind("hello world", console.log);
foo_bound(0); // prints "hello world" through console.log

WebAssembly / shared-everything-threads

Context locals redux #66