emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.6k stars 3.28k forks source link

Make use of Asyncify for pthread_join #9910

Open RReverser opened 4 years ago

RReverser commented 4 years ago

[Forked from discussion in #9579]

Rather than disallowing blocking on the main thread and giving an escape hatch in form of a setting that still allows it, we could transform pthread_join and friends via Asyncify instead and use Atomics.waitAsync under the hood.

This way the main browser thread wouldn't really be blocked, but the code would be still backwards-compatible and properly wait for thread to finish execution as user expects.

kripken commented 4 years ago

This should be straightforward to add if someone's interested. For examples of asyncified functions, see for example emscripten_sleep and other functions after it. We'd want to do the same to pthread_join which is defined in library_pthread.js. Note that the function it calls, _emscripten_do_pthread_join, already has a parameter to allow nonblocking behavior, so building on top of that should help.

VirtualTim commented 4 years ago

So would this require building with another flag, or would this just modify the existing pthreads code? Either way being able to join on the main thread without hanging would be really great, that's probably the biggest difference between Emscripten compiled code and native. And since native code doesn't really have a concept of a 'main thread' it can be hard to adjust code around that limitation.

sbc100 commented 4 years ago

This would be behind the existing ASYNCIFY .. so it would still be opt in in that sense. But we could direct users towards this in the warning message.

RReverser commented 4 years ago

@sbc100 Yeah, and after this is implemented, I thought we might flip the recent flag default value, so that users would need to explicitly opt in into either ASYNCIFY or ALLOW_BLOCKING_ON_MAIN_THREAD as means of using pthread_join.

bmeurer commented 4 years ago

I'm not sure about the particular case of pthread_join, but @RReverser brought this up in the context of workers being stuck in Atomics.wait: Would it be possible for Emscripten to not block workers while waiting for a thread to be scheduled, but instead wait in the main event loop and schedule the thread via postMessage?

RReverser commented 4 years ago

@bmeurer I'm not sure I understand your context... But what this would allow is to specifically not have Atomics.wait in your code, and instead in each of these places wait using Atomics.waitAsync, so that it wouldn't actually block the thread, but rather just wait in the corresponding thread's event loop.

bmeurer commented 4 years ago

@RReverser Right. So I'm wondering if we could extend this in such a way that also Emscripten wouldn't use Atomics.wait for the thread pool workers, but instead just sit in the worker event loop.

RReverser commented 4 years ago

Ah yeah, that's what I had in mind too.

kripken commented 4 years ago

Would it be possible for Emscripten to not block workers while waiting for a thread to be scheduled, but instead wait in the main event loop and schedule the thread via postMessage?

I think that could work with Asyncify, yes - I think we always call Atomics.wait from emscripten_futex_wait, so we'd asyncify that function.

However, a downside is that it means anything that can reach a call to emscripten_futex_wait would get asyncified, which means lots of code using locks, malloc, etc. That might make real-world code significantly slower, as it may mean the entire program ends up asyncified (instead of the more common Asyncify use case, where just code that can reach emscripten_sleep or another special function).

(Without Asyncify, I don't think that would work on general code, as it would require the main thread to yield to the event loop before the worker activates.)

bmeurer commented 4 years ago

@kripken Yes, that's basically what I meant, except that I was thinking only about the logic that deals with waiting on the workers for threads to be scheduled via pthread_start. maybe you could use a different version of emscripten_futex_wait just for that?

On the DevTools side we'd still need to deal with workers that are really blocked in Atomics.wait of course, but at least for the case where the worker is literally idle and not doing anything useful, it'd be parked in the main event loop.

kripken commented 4 years ago

I see, thanks @bmeurer - yes, that does sound much more practical.

kripken commented 4 years ago

Note though that I doubt we can convince most people to use Asyncify by default, because of the significant overhead. However, perhaps they can do this while debugging at least.

bmeurer commented 4 years ago

I'm not a big fan of having the debug build use different mechanics as that will make it harder to debug subtle issues that were detected in production. I feel like we should use asyncify where it makes sense (in both debug and release) and otherwise come up with a debug story for Atomics.wait

sbc100 commented 4 years ago

I think asyncify should continue to be opt-in at the point; its a pretty big hammer

I also think the defaults should not be different for debug and release.

@bmeurer can you link specifically the code you are referring to as " the logic that deals with waiting on the workers for threads to be scheduled via pthread_start"? Are you talking specifically about the thread pool, or all thread creation?

kripken commented 4 years ago

I looked into the worker startup wait in depth this morning, and it turns out Emscripten doesn't do Atomics.wait in the worker pool or elsewhere in worker startup. It actually does a postMessage, and the worker's startup waits for that message.

The sync startup issue appears to be confined to Web Worker creation. Once the Worker has been asynchronously created due to PTHREAD_POOL_SIZE, it waits for a message (specifically 'run') to start running.

To verify this, I removed Atomics.wait entirely from emscripten and ran a bunch of tests, and as expected, tests can pthread_create etc., without ever calling Atomics.wait. Only tests that use mutexes and such end up requiring it.

(Note that if the application did not use PTHREAD_POOL_SIZE, then pthread_create becomes async, and the main thread must yield before the worker is responsive, but it still won't use Atomics.wait anywhere.)

@bmeurer Where did we see an application do Atomics.wait during pthread startup? Perhaps that happens in a userspace worker pool - if so, then I think our main options are to document the issue + ask people to refactor, or have Asyncify run on futex_wait (but I agree that's of limited value).

bmeurer commented 4 years ago

@kripken Uhm, now I also looked through the generated .worker.js in detail (which I should have done before posting...), and you're right, it's all done via the main event loop. We had seen this in an issues that we've been looking into for a while now, and I was somehow convinced that the Atomics.wait originates from Emscripten. Sorry for the confusion.

So this is already working as expected regarding thread creation, which I just verified on a trivial demo.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

sbc100 commented 3 years ago

bump

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because there has been no activity in the past year. It will be closed automatically if no further activity occurs in the next 30 days. Feel free to re-open at any time if this issue is still relevant.

sbc100 commented 2 years ago

bump

Guillaume227 commented 1 year ago

I am a newcomer to emscripten and have been trying to catch-up on this issue in relation to an existing code base that waits on its main thread. The cpp code-base uses std::mutex, std::condition_variable and I am looking for a transparent (as in no rewrite based on emscripten_xyz primitives) way to cross-compile for the web.

This should be straightforward to add if someone's interested. For examples of asyncified functions, see for example emscripten_sleep and other functions after it. We'd want to do the same to pthread_join which is defined in library_pthread.js. Note that the function it calls, _emscripten_do_pthread_join, already has a parameter to allow nonblocking behavior, so building on top of that should help.

Looking at library_pthread.js, I can't find a definition of pthread_join - do the pointers I am quoting above need refreshing ? Is the outlined approach still valid after 3+ years of development went in?

sbc100 commented 1 year ago

These days pthread_join is implemented purely in native code here: system/lib/libc/musl/src/thread/pthread_join.c.

@Guillaume227 have you tried using PROXY_TO_PTHREAD and/or PTHREAD_POOL_SIZE?

Guillaume227 commented 1 year ago

Thanks I see it there now. I do use PTHREAD_POOL_SIZE. I tried PROXY_TO_PTHREAD but that doesn't play nicely with the UI aspect of the application (ImGui-based) and so my understanding is I have to run the main() method on the browser Main Thread. Without PROXY_TO_PTHREAD the c++ multi-threading works nicely until I try to leverage emscripten proxying to wrap WebSerial access. I am hitting what looks like a dead-lock which I am attributing to not yielding properly when the main thread makes blocking serial requests. The other avenue I am considering is using the WASM worker API.

sbc100 commented 1 year ago

Wasm Workers, just like pthreads, are based on Web Workers, so have most of the same issues around synchronous startup.

RReverser commented 1 year ago

I am hitting what looks like a dead-lock which I am attributing to not yielding properly when the main thread makes blocking serial requests.

It does sound likely that you want to insert calls to Asyncify there. You should be able to do that today, it's not related to this issue. For simplest workaround, try compiling with -sASYNCIFY and insert emscripten_sleep(0); calls in places where you want to yield to the browser event loop - that should let it process any WebSerial events.