coder-mike / microvium

A compact, embeddable scripting engine for applications and microcontrollers for executing programs written in a subset of the JavaScript language.
MIT License
569 stars 25 forks source link

Async/await design proposal #37

Open coder-mike opened 1 year ago

coder-mike commented 1 year ago

Async/await in Microvium

I'm not "promising" that async/await will be implemented in Microvium, but it's on my radar. This GitHub ticket is a consolidation of my personal design notes on the feature, in case anyone wants to review it and offer suggestions or improvements, or just for those interested in the process. Feel free to add comments to this ticket with any questions, corrections, clarifications or suggestions.

In this first message, I'll cover the motivation and requirements, and then in the next message, I'll cover the proposed design.

Why async/await?

Objectives

Gotchas to watch out for

What should the following print (in particular, in what order)?

useThenable('a');
usePromise('b');
useAsync('c');

async function useThenable(name) {
  console.log(`Awaiting thenable ${name}`)
  await { then: resolve => resolve() };
  console.log(`Finished awaiting thenable ${name}`)
}

async function usePromise(name) {
  console.log(`Awaiting promise ${name}`)
  await new Promise(resolve => resolve());
  console.log(`Finished awaiting promise ${name}`)
}

async function useAsync(name) {
  console.log(`Awaiting async func ${name}`)
  await (async () => {})();
  console.log(`Finished awaiting async func ${name}`)
}

The answer is:

Awaiting thenable a
Awaiting promise b
Awaiting async func c
Finished awaiting promise b
Finished awaiting async func c
Finished awaiting thenable a

The key gotcha here is that the thenable completes last. Why? Because a thenable is not a real Promise, so the ECMA-262 spec says that it must be wrapped in a promise. I think the extra wrapping layer causes it to be posted to the promise job queue twice, so the callback is run last.

This also means that thenables are necessarily less efficient than promises, because thenables must be wrapped in promises anyway (so you land up having both the thenable and the promise in RAM) and they must have this behavior where they land up on the promise job queue twice instead of once.

Here's another gotcha:

usePromise('a');
useAsync('b');
useThenable('c');

function useThenable(name) {
  console.log(`Waiting for thenable ${name}`)
  waitFor({ then: resolve => resolve(name) });
}

async function usePromise(name) {
  console.log(`Waiting for promise ${name}`)
  waitFor(new Promise(resolve => resolve(name)));
}

async function useAsync(name) {
  console.log(`Waiting for async func ${name}`)
  waitFor((async () => name)());
}

function waitFor(thing) {
  thing.then(result => {
    console.log(`Finished waiting for ${result}`)
  })
}

This prints:

Waiting for promise a
Waiting for async func b
Waiting for thenable c
Finished waiting for c
Finished waiting for a
Finished waiting for b

The key thing here is that the thenable c completes first. Why? Because Promises always evaluate their callback asynchronously in the promise job queue, while an arbitrary thenable does not.

As stated in the objectives, if this code runs in Microvium then it must produce the same output. We have the choice either to forbid/remove certain features or to implement them correctly.

coder-mike commented 1 year ago

Proposed design

Note: There is a babel plugin to transform async to promises. I considered using that because it's an off-the-shelf way of supporting async with very little development cost on my end, but when I ran it on some test code, it looks like the output is horrendously complicated and verbose, especially in the case of things like loops. I don't think I could ever stomach using async if it were that inefficient.

CPS

The core of this design is based on continuation-passing style (CPS), where an awaiting caller passes in a callback that is a continuation to execute when the async callee completes. This is fundamentally different from typical promises where the caller subscribes a callback after the callee returns. I believe that the continuation-passing style is much more memory efficient, but it will only work in a subset of scenarios (more on that later)

For example, if async function foo calls await bar(), where bar is also an async function, then bar must remember that foo is awaiting it so that it knows where to return to.

I'm proposing the following memory structure for such a scenario:

2022-09-05 Minimal async

In this design, an idle (awaiting) async function consumes 8 bytes of memory (3 slots + a 2-byte allocation header), plus 2 bytes for each variable in the async function.

To do this, I propose a modification to the current 6-byte closure structure. Currently, a closure is a 6-byte tuple that pairs a function pointer with an environment pointer. In this proposed change, a closure is also its own environment, rather than holding a pointer to its environment. It still maintains the scope pointer, but this will be a pointer to its parent environment. In other words, I'm proposing the unification of closures and environments, so that environments are closures and closures are environments. This has 2 advantages:

When a closure is called, the function in the second slot (in the diagram this is the resumeAt slot) will be invoked, using the whole closure structure as the current closure scope. In the case where multiple closures need to share an environment, they can each have their parent scope pointer point to the shared environment.

Awaiter

In this proposed async design, a call like await foo() will pass a CPS callback, which I call the awaiter. An awaiter is a callback function (which is likely to be a closure that represents the continuation of the caller operation) that the async callee will invoke when it terminates (resolves or rejects).

awaiter(isSuccess, result)

The awaiter call conforms to the following contract:

In the case of one async function foo calling another async function bar (like in the diagram earlier), these invariants can be trivially guaranteed. The callee bar is under the control of the compiler and so it can invoke the awaiter only on the job queue, and it will only invoke it once (because bar only completes once).

Compatibility with promises

The above proposed CPS "core" will only work in the specific scenario where one async function is calling and awaiting another async function, since both the caller and callee are under the control of the compiler. I believe this to be the most common case, and something worth optimizing for. But we need to also support the case where we await things that are not other async functions, and similarly need to support the use of calling an async function but not awaiting it (or at least not immediately).

The compiler will recognize the syntactic form await bar(), which is a function call expression with an await on the result. Roughly speaking, this syntactic form will compile to something which passes the awaiter continuation from the caller to the callee (possibly by a new VM register). If the callee is itself an async function then it will "take" the continuation and store it in its awaiter slot for use later.

This is the ideal/optimal case, but there are other cases to deal with:

  1. x = bar() (i.e. not awaited) where bar is an async function.
  2. await bar where bar is a promise
  3. await bar where bar is not a promise
  4. await bar() where bar is not an async function.

In case #1, bar needs to return a promise, since it's an async function and now the return value is observable, and CPS can't be used directly. For that case, bar will "see" that there is no awaiter and so it will create a Promise object and synthesize its own awaiter which resolves this promise. Then bar must return this Promise. The Promise acts as a protocol adapter.

In case #2, bar needs to have its awaiter added to the __subscriber list of the promise. Internally, the subscriber list will follow the same contract guarantees as CPS, so it can be added directly. Or if the promise is already settled, the awaiter can be enqueued or invoked directly.

Case #3 will be quite unusual I think, but it would be easy to support. The non-promise object can be wrapped in a promise that is in a resolved state before following the logic for #2.

In case #3, foo will pass its awaiter to bar who will not use it. So foo must then "notice" that the awaiter callback is not consumed by the callee (because the callee is not CPS compatible), and so fall back to the logic of #2 or #3 on the result of the call to bar.

I've elided a lot of the details here, because I haven't worked through them yet. I'm thinking that there will be probably be a new virtual register named awaiter which is used as a side channel for the caller and callee to negotiate the CPS protocol. Ideally, this register may also be exposed to the C host, so that native async functions can participate in CPS if they support it, rather than relying on inefficient promises at the bottom.

What does resumeAt point to?

I propose that async functions are basically compiled down to a single closure but with multiple entry points. The resumeAt pointer will be a "function pointer" that is just a trampoline to invoke the primary async closure. There are no particular checks in Microvium that restrict the Jump instructions from jumping only within their local function bytecode, so these resumeAt functions could just be single-instruction functions which just jump to the right location in the primary async function.

How much RAM does this solution use?

Each idle async function uses 8 bytes, plus 2 bytes for each variable in the function.

At the boundary where async functions are interfacing with non-async function, promises will be needed. A promise is an object with the following properties:

An object with 2 properties is 14 bytes, so this brings the cost of async functions to 22 bytes at this boundary scenario.

In the situation where the user creates a promise using new Promise((resolve, reject) => {...}), which is typical at the leaves of the async hierarchy, this will be a 14 byte promise object with 2 additional closures: resolve and reject. With the new closure design, the reject closure could "inherit" from the resolve closure so that only two allocations are involved. In this case, the reject closure would be 6 bytes and the resolve closure could also be 6 bytes (if it reused the scope slot as the promise slot) -- a total of 12 bytes. In addition to the 14-byte promise object itself, this would be 26 bytes.

In summary:

coder-mike commented 9 months ago

Version 8.0.0 brings async-await. I'll be doing some write-ups on how it works. It's roughly what I proposed above but slightly more efficient, only requiring 6 bytes of idle memory in the best case.