emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.89k stars 3.32k forks source link

Async JS library funcs should not be marked for Asyncify transform #20289

Open RReverser opened 1 year ago

RReverser commented 1 year ago

After #19093 any async function () { } in JS libraries is marked for Asyncify transform. It's a bit hard to describe why, but I don't think this is correct:

1) For Asyncify (-s ASYNCIFY), functions must be wrapped into Asyncify.handleAsync to work correctly, which would make them not recognised by this check. And, vice versa, if an unwrapped async function () {} is recognised by this check, as it is now, it will be passed to Binaryen and transformed - which expands the file size - for no reason, since it's going to be called only directly and not inside Asyncify.handleAsync/Asyncify.handleSleep. 2) For JSPI, we can sort of get away with it since it's much less reliant on wrapping, so it at least seems to work, but since such functions are still not wrapped into handleAsync, they don't get the runtime keepalive counter increased/decreased upon entering/exit correspondingly. 3) For PROXY_TO_PTHREAD mode, such functions need to be treated as void as async wakeup is happening manually via proxying.h functions. Which means that again, they shouldn't be marked as Asyncify functions.

For example, looking at current usages in https://github.com/emscripten-core/emscripten/blob/992d1e91d5c0af2f21dd1ee3d572a03c6727122c/src/library_wasmfs_jsimpl.js#L92-L99, if someone compiles this with -s ASYNCIFY -s PROXY_TO_PTHREAD (because their project depends on both of those features), right now this function will be passed to and transformed by Binaryen as Asyncify function, increasing the Wasm size, but then never used as Asyncify function because it's not wrapped into Asyncify.handleAsync.

This seems like a footgun.

I'd suggest that we should either 1) require users to explicitly mark functions as __async: true, which is easy to put into #ifdef so that it only activates when the function is really meant to be used with Asyncify or 2) provide some more magic & automatic wrapping for all async functions that would perform proxying.h-based wakeup in PROXY_TO_PTHREAD mode and Asyncify-based wakeup otherwise.

cc @brendandahl

brendandahl commented 1 year ago

We've talked a bit about automatically wrapping async functions to do Asyncify.handleAsync so the user doesn't have to and only using promises, but there could be some potential downsides(mentioned in link above). Overall, it would be nice to unify JSPI/Asyncify and make it less error prone (e.g. a user forgetting handleAsync).

As for asyncify+proxy, I hadn't really thought of this mix. In an ideal world, proxying would just use JSPI, but that is probably a pretty far ways off. Another options is async proxy functions could opt-out of asyncify with __async: false.

sbc100 commented 1 year ago

We've talked a bit about automatically wrapping async functions to do Asyncify.handleAsync so the user doesn't have to and only using promises, but there could be some potential downsides(mentioned in link above). Overall, it would be nice to unify JSPI/Asyncify and make it less error prone (e.g. a user forgetting handleAsync).

As for asyncify+proxy, I hadn't really thought of this mix. In an ideal world, proxying would just use JSPI

I'm not sure about that.. I think it might be faster/better to just block the calling thread like we do today rather than returning to the event loop (at least in most cases).

brendandahl commented 1 year ago

I think we discussed this in a meeting awhile ago. IIRC, at least for the web, any api that is async would need to do async work and need to return the event loop no matter what, so it wouldn't really affect performance. I could see the use case sync/async mixing for something like reading some buffer and sometimes it would be sync and sometimes it would be async if the buffer ran out.

Alternatively, we could always wrap and use promises (the common case), but provide a mechanism to do sync work. Now that I think of it, this mechanism exists today, a user could have an async and sync version of a function if they really wanted.

RReverser commented 1 year ago

Unifying PROXY_TO_PTHREAD + Asyncify via single API for the user is something I also thought about in the past, and would love to see it happen someday, but it's clear it requires a bigger design discussion / decision.

Meanwhile, specifically for this issue, could we just not mark async functions as Asyncify since they're not really Asyncify-compatible without the explicit wrapper in today's implementation?

Until there is clear design on automatic wrapping, I think the current implicit marking w/o wrapping can cause more non-obvious problems than asking user to explicitly mark functions as __async: true when necessary.

RReverser commented 1 year ago

IIRC, at least for the web, any api that is async would need to do async work and need to return the event loop no matter what, so it wouldn't really affect performance.

For this just want to add is that what @sbc100 meant is that the calling thread (aka non-main thread) doesn't have to return to event loop in that scenario. When proxying, the main thread indeed needs to return to event loop to execute the actual async work, but the extra pthread that called it can just wait synchronously on an atomic - there's no need to unwind both of them.

brendandahl commented 1 year ago

I misread that. Yeah, no need to unwind in the proxying case.

brendandahl commented 7 months ago

A little update... It's looking like JSPI will be changing to allow Wasm imports to return a value or promise. This is because always returning a promise can lead to performance issues when the import doesn't need to actually suspend. This will align more asyncify=1 and make it so we should support sync returns with JSPI.

My initial thoughts are doing something like the following where we try to get rid of the user having to do handleAsync/handleSleep and separate out the async function and how asyncify handles it.

{
  // With callbacks, supports sync or async. Mainly for legacy support.
 // The resumeCallback would automatically be sent as the last parameter by emscripten.
  do_some_work__asyncify: 'callback',
  do_some_work: (arg1, resumeCallback) => {
    // Resume can then be called sync or async
    someAsyncWorkWithCallback(resumeCallback);
  },

  // With promises, can return value (sync) or promise (async)
  do_some_work__asyncify: 'promise',
  do_some_work: () => {
    if (ready) {
      return 99;
    }
    return someAsyncWork();
  },

  // With an async function. Only async supported, but probably what will be
  // done most.
  do_some_work__asyncify: 'promise',
  do_some_work: async () => {
    let result = await someAsyncWork();
    return result;
  },
}