Async/await in Microvium

I'm not "promising" that async/await will be implemented in Microvium, but it's on my radar. This GitHub ticket is a consolidation of my personal design notes on the feature, in case anyone wants to review it and offer suggestions or improvements, or just for those interested in the process. Feel free to add comments to this ticket with any questions, corrections, clarifications or suggestions.

In this first message, I'll cover the motivation and requirements, and then in the next message, I'll cover the proposed design.

Why async/await?

I believe that Microivum will be useful in IoT solutions, where a device interacts with a server. Async/await is a very ergonomic way of expressing control flow logic involving a non-local target.
Similarly I've expressed that callback-based-async is a good reason to use Microvium in an otherwise-C environment, for expressing sequences of non-blocking behavior that might otherwise be expressed in C as complicated state machines. Obviously, async-await is even better than callbacks!

Objectives

Be able to write async functions that await the result of other async functions.
Provide a way of creating something that an async function can await that isn't another async function. E.g. a thenable or promise. Ideally, this should be the syntax new Promise(...) since this is the typical way to do this in JS.
As always, the behavior of user code should be the same in Microvium as in V8.
As always, size is a major priority. First RAM size, and secondly the engine's compile size.

Gotchas to watch out for

What should the following print (in particular, in what order)?

useThenable('a');
usePromise('b');
useAsync('c');

async function useThenable(name) {
  console.log(`Awaiting thenable ${name}`)
  await { then: resolve => resolve() };
  console.log(`Finished awaiting thenable ${name}`)
}

async function usePromise(name) {
  console.log(`Awaiting promise ${name}`)
  await new Promise(resolve => resolve());
  console.log(`Finished awaiting promise ${name}`)
}

async function useAsync(name) {
  console.log(`Awaiting async func ${name}`)
  await (async () => {})();
  console.log(`Finished awaiting async func ${name}`)
}

The answer is:

Awaiting thenable a
Awaiting promise b
Awaiting async func c
Finished awaiting promise b
Finished awaiting async func c
Finished awaiting thenable a

The key gotcha here is that the thenable completes last. Why? Because a thenable is not a real Promise, so the ECMA-262 spec says that it must be wrapped in a promise. I think the extra wrapping layer causes it to be posted to the promise job queue twice, so the callback is run last.

This also means that thenables are necessarily less efficient than promises, because thenables must be wrapped in promises anyway (so you land up having both the thenable and the promise in RAM) and they must have this behavior where they land up on the promise job queue twice instead of once.

Here's another gotcha:

usePromise('a');
useAsync('b');
useThenable('c');

function useThenable(name) {
  console.log(`Waiting for thenable ${name}`)
  waitFor({ then: resolve => resolve(name) });
}

async function usePromise(name) {
  console.log(`Waiting for promise ${name}`)
  waitFor(new Promise(resolve => resolve(name)));
}

async function useAsync(name) {
  console.log(`Waiting for async func ${name}`)
  waitFor((async () => name)());
}

function waitFor(thing) {
  thing.then(result => {
    console.log(`Finished waiting for ${result}`)
  })
}

This prints:

Waiting for promise a
Waiting for async func b
Waiting for thenable c
Finished waiting for c
Finished waiting for a
Finished waiting for b

The key thing here is that the thenable c completes first. Why? Because Promises always evaluate their callback asynchronously in the promise job queue, while an arbitrary thenable does not.

As stated in the objectives, if this code runs in Microvium then it must produce the same output. We have the choice either to forbid/remove certain features or to implement them correctly.

Proposed design

Note: There is a babel plugin to transform async to promises. I considered using that because it's an off-the-shelf way of supporting async with very little development cost on my end, but when I ran it on some test code, it looks like the output is horrendously complicated and verbose, especially in the case of things like loops. I don't think I could ever stomach using async if it were that inefficient.

CPS

The core of this design is based on continuation-passing style (CPS), where an awaiting caller passes in a callback that is a continuation to execute when the async callee completes. This is fundamentally different from typical promises where the caller subscribes a callback after the callee returns. I believe that the continuation-passing style is much more memory efficient, but it will only work in a subset of scenarios (more on that later)

For example, if async function foo calls await bar(), where bar is also an async function, then bar must remember that foo is awaiting it so that it knows where to return to.

I'm proposing the following memory structure for such a scenario:

2022-09-05 Minimal async

In this design, an idle (awaiting) async function consumes 8 bytes of memory (3 slots + a 2-byte allocation header), plus 2 bytes for each variable in the async function.

scope - a pointer to the outer variable environment of the async function, so it can access its lexical closure. This slot could also be overloaded to hold the result of the async operation since the result is only available when the scope is no longer needed.
resumeAt - a function pointer that points to the bytecode corresponding to the current resume point in the async function. Practically, this could be a lightweight trampoline function that jumps into the correct location in the main async code. The resumeAt slot could also be overloaded to point to code that invokes the awaiter when the async function reaches its terminal state, so that we don't need to allocate a separate closure to put on the promise job queue.
awaiter is a function pointer that points to the callback/continuation of the async caller.

To do this, I propose a modification to the current 6-byte closure structure. Currently, a closure is a 6-byte tuple that pairs a function pointer with an environment pointer. In this proposed change, a closure is also its own environment, rather than holding a pointer to its environment. It still maintains the scope pointer, but this will be a pointer to its parent environment. In other words, I'm proposing the unification of closures and environments, so that environments are closures and closures are environments. This has 2 advantages:

At least one closure in an environment can be allocated with the environment itself, saving on the cost of an allocation. This reduces the cost of the closure+environment from 10 bytes to 6 bytes, which is significant.
Since the closure is its own environment, a closure can use normal variable-accessing opcodes to mutate the closure itself. In particular, it can mutate the function pointer in the closure, as if it's a local variable. This is important for this async proposal -- it allows the async function code to mutate its resumeAt pointer according to its current state.

When a closure is called, the function in the second slot (in the diagram this is the resumeAt slot) will be invoked, using the whole closure structure as the current closure scope. In the case where multiple closures need to share an environment, they can each have their parent scope pointer point to the shared environment.

Awaiter

In this proposed async design, a call like await foo() will pass a CPS callback, which I call the awaiter. An awaiter is a callback function (which is likely to be a closure that represents the continuation of the caller operation) that the async callee will invoke when it terminates (resolves or rejects).

awaiter(isSuccess, result)

The awaiter call conforms to the following contract:

The first parameter is an isSuccess flag that is true if callee resolves and false if the callee rejects.
The second parameter is the result if isSuccess is true or an error if isSuccess is false.
The awaiter must be called on the promise job queue, or it may be called synchronously if the promise job queue is empty.
The awaiter must not be called more than once.

In the case of one async function foo calling another async function bar (like in the diagram earlier), these invariants can be trivially guaranteed. The callee bar is under the control of the compiler and so it can invoke the awaiter only on the job queue, and it will only invoke it once (because bar only completes once).

Compatibility with promises

The above proposed CPS "core" will only work in the specific scenario where one async function is calling and awaiting another async function, since both the caller and callee are under the control of the compiler. I believe this to be the most common case, and something worth optimizing for. But we need to also support the case where we await things that are not other async functions, and similarly need to support the use of calling an async function but not awaiting it (or at least not immediately).

The compiler will recognize the syntactic form await bar(), which is a function call expression with an await on the result. Roughly speaking, this syntactic form will compile to something which passes the awaiter continuation from the caller to the callee (possibly by a new VM register). If the callee is itself an async function then it will "take" the continuation and store it in its awaiter slot for use later.

This is the ideal/optimal case, but there are other cases to deal with:

x = bar() (i.e. not awaited) where bar is an async function.
await bar where bar is a promise
await bar where bar is not a promise
await bar() where bar is not an async function.

In case #1, bar needs to return a promise, since it's an async function and now the return value is observable, and CPS can't be used directly. For that case, bar will "see" that there is no awaiter and so it will create a Promise object and synthesize its own awaiter which resolves this promise. Then bar must return this Promise. The Promise acts as a protocol adapter.

In case #2, bar needs to have its awaiter added to the __subscriber list of the promise. Internally, the subscriber list will follow the same contract guarantees as CPS, so it can be added directly. Or if the promise is already settled, the awaiter can be enqueued or invoked directly.

Case #3 will be quite unusual I think, but it would be easy to support. The non-promise object can be wrapped in a promise that is in a resolved state before following the logic for #2.

In case #3, foo will pass its awaiter to bar who will not use it. So foo must then "notice" that the awaiter callback is not consumed by the callee (because the callee is not CPS compatible), and so fall back to the logic of #2 or #3 on the result of the call to bar.

I've elided a lot of the details here, because I haven't worked through them yet. I'm thinking that there will be probably be a new virtual register named awaiter which is used as a side channel for the caller and callee to negotiate the CPS protocol. Ideally, this register may also be exposed to the C host, so that native async functions can participate in CPS if they support it, rather than relying on inefficient promises at the bottom.

What does `resumeAt` point to?

I propose that async functions are basically compiled down to a single closure but with multiple entry points. The resumeAt pointer will be a "function pointer" that is just a trampoline to invoke the primary async closure. There are no particular checks in Microvium that restrict the Jump instructions from jumping only within their local function bytecode, so these resumeAt functions could just be single-instruction functions which just jump to the right location in the primary async function.

How much RAM does this solution use?

Each idle async function uses 8 bytes, plus 2 bytes for each variable in the function.

At the boundary where async functions are interfacing with non-async function, promises will be needed. A promise is an object with the following properties:

__status to indicate whether the promise is already settled or not, and what state it settled into
__callback that points to a CPS callback or a list of CPS callbacks.
When resolved, it will have a __value. I think the __value and the __callback properties could be the same overloaded property since they are used at different phases of the promise. So there's really only 2 properties.

An object with 2 properties is 14 bytes, so this brings the cost of async functions to 22 bytes at this boundary scenario.

In the situation where the user creates a promise using new Promise((resolve, reject) => {...}), which is typical at the leaves of the async hierarchy, this will be a 14 byte promise object with 2 additional closures: resolve and reject. With the new closure design, the reject closure could "inherit" from the resolve closure so that only two allocations are involved. In this case, the reject closure would be 6 bytes and the resolve closure could also be 6 bytes (if it reused the scope slot as the promise slot) -- a total of 12 bytes. In addition to the 14-byte promise object itself, this would be 26 bytes.

In summary:

In the hopefully-common case of await asyncFunc(), each awaited function is a single 8-byte allocation.
In the less-common case where the result of the async function is assigned to a temporary variable or used in non-async code, the awaited function is 20 bytes.
In the leaf scenario where a user constructs their own promise, it will cost 26 bytes per promise.
For cases where the host implements the CPS protocol directly, the cost is essentially free in terms of managed memory. Or rather, the host will need to use its own memory, to hold a handle that points to the awaiter callback.

coder-mike / microvium

Async/await design proposal #37

Async/await in Microvium

Why async/await?

Objectives

Gotchas to watch out for

Proposed design

CPS

Awaiter

Compatibility with promises

What does `resumeAt` point to?

How much RAM does this solution use?

coder-mike / microvium

Async/await design proposal #37

Async/await in Microvium

Why async/await?

Objectives

Gotchas to watch out for

Proposed design

CPS

Awaiter

Compatibility with promises

What does resumeAt point to?

How much RAM does this solution use?

What does `resumeAt` point to?