Open omnisip opened 4 years ago
Do you mean that you create pthreads and stop using them - and they exit with pthread_exit
normally - but they are not made available in the pool again? That sounds like a possible bug. I'd check if it happens in the browser, as that is more heavily tested - if it works there, it may be a bug specific to the node integration.
PROXY_TO_PTHREAD
should also fix this, as it allows synchronous creation of pthreads. Coincidentally https://github.com/emscripten-core/emscripten/pull/12431 is adding docs and testing for that. If that doesn't work for you, it could be another bug.
If you don't have luck figuring this out, then attaching small testcases for these issues could allow us to investigate.
That is exactly what happens. The pool of the threads will disappear in a 2 pass algorithm that spawns threads in each pass. Even though all the threads were used in the first pass, they exit and don't return to the pool for the second pass.
Just checking, do you join with your used threads? Or set them as detached? I would imagine that have have to do one or the other before they can be recycled in the pool.
std::async. I join on the futures and the tasks finish.
On Mon, Oct 5, 2020, 19:41 Sam Clegg notifications@github.com wrote:
Just checking, do you join with your used threads? Or set them as detached? I would imagine that have have to do one or the other before they can be recycled in the pool.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/emscripten-core/emscripten/issues/12400#issuecomment-703978804, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKQJUB2YYIJZBAYNYU3TTSJJYS5ANCNFSM4SA6V4JQ .
Do you still need test cases for this? Set your PTHREAD_POOL_SIZE to 16 and try something like this:
int foo (int p) {
return p;
}
void functionCalledFromNodeJs() {
std::array<future, 16> futures;
// first pass
for (int i = 0; i != 16; ++i) {
futures[i] = std::async(std::launch::async, foo, i);
}
for (int i=0; i!=16; ++i) {
futures[i].wait();
}
// second pass
for (int i = 0; i != 16; ++i) {
futures[i] = std::async(std::launch::async, foo, i);
}
for (int i=0; i!=16; ++i) {
futures[i].wait();
}
}
@omnisip
Trying to get what you wrote to compile, I hit
a.cpp:10:15: error: use of class template 'future' requires template arguments
std::array<future, 16> futures;
It can save a lot of time to provide already-runnable testcases.
@omnisip
Trying to get what you wrote to compile, I hit
a.cpp:10:15: error: use of class template 'future' requires template arguments std::array<future, 16> futures;
It can save a lot of time to provide already-runnable testcases.
future<int>
sorry.
Give me a few minutes and I'll push a godbolt for you.
em++ -g3 -O3 -s PTHREAD_POOL_SIZE=16 -s USE_PTHREADS=1 -s MODULARIZE=1 -s ALLOW_MEMORY_GROWTH=1 -s ASSERTIONS=1 -msimd128 -mavx -s EXPORTED_FUNCTIONS='[ _malloc, _free, _functionCalledFromNodeJs ]' -s EXPORTED_RUNTIME_METHODS='[ccall, cwrap]' -s 'EXPORT_NAME=threadTest' -I. -o threadTest.js threadTest.cpp
dan@dl360:~/applications/test-rgb2y$ cat testrunner-thread.js
let loader = require('./threadTest');
loader().then(m => {
m.ccall('functionCalledFromNodeJs','void',[],[]);
})
/usr/bin/node --experimental-wasm-simd --experimental-wasm-modules --experimental-wasm-threads --experimental-wasm-bulk-memory ./testrunner-thread.js
if you change the PTHREAD_POOL_SIZE to 32, it will succeed, but will fail at 16.
Code inlined for convenience:
#include <array>
#include <future>
int foo (int p) {
return p;
}
extern "C"
void functionCalledFromNodeJs() {
std::array<std::future<int>, 16> futures;
// first pass
for (int i = 0; i != 16; ++i) {
futures[i] = std::async(std::launch::async, foo, i);
}
for (int i=0; i!=16; ++i) {
futures[i].wait();
}
// second pass
for (int i = 0; i != 16; ++i) {
futures[i] = std::async(std::launch::async, foo, i);
}
for (int i=0; i!=16; ++i) {
futures[i].wait();
}
}
Thanks @omnisip !
It looks like what happens here is that when a worker exits, it postMessage
s the main thread to tell it that it can be reused ('exit'
). But in this code, the main thread never exits to the event loop, it just constantly runs, so it never gets that.
It may be possible to avoid this problem by sending the message using the C queues, or some other mechanism that avoids the JS event loop.
I'm not sure how robust that can be. In general, using -s PROXY_TO_PTHREAD
can avoid this entire class of problems - I verified it can run this testcase ok.
With proxy to pthread, I tried it before filing this issue. Is there a new version that has a fix or something?
@omnisip I don't remember a related fix. But please try it on the latest version. If it doesn't work for you with PROXY_TO_PTHREAD
, that could be a separate issue.
Retry the sample with PTHREAD_POOL_SIZE=4 and PROXY_TO_PTHREAD=1. It will fail. I think the behavior should be that it restores the threads after completion -- so even if it's not an ideal pool size, it should handle them in order.
@omnisip , it seems to work for me with 4 as well. Here is exactly what I am running:
#include <stdio.h>
#include <array>
#include <future>
int foo (int p) {
return p;
}
int main() {
std::array<std::future<int>, 16> futures;
puts("first pass");
for (int i = 0; i != 16; ++i) {
futures[i] = std::async(std::launch::async, foo, i);
}
puts("wait");
for (int i = 0; i != 16; ++i) {
futures[i].wait();
}
puts("second pass");
for (int i = 0; i != 16; ++i) {
futures[i] = std::async(std::launch::async, foo, i);
}
puts("wait");
for (int i = 0; i != 16; ++i) {
futures[i].wait();
}
puts("done!");
}
Building with
./emcc a.cpp -pthread -s TOTAL_MEMORY=128MB -s PTHREAD_POOL_SIZE=16 -s PROXY_TO_PTHREAD
works, as does the same with a pool size of 4. In both cases it reaches the final print of "done!".
(Without PROXY_TO_PTHREAD
, it fails in a different place based on the pool size - with 16 it can reach the second wait.)
@kripken Your example is different from mine because it has a main() function. Mine explicitly calls from Node.JS (thus the javascript file that goes with it). It still hangs for me (-s PROXY_TO_PTHREAD=1 or -s PROXY_TO_PTHREAD) if I don't give it enough threads. See below:
#include <cstdio>
#include <array>
#include <future>
int foo(int c) {
return c;
}
extern "C"
void functionCalledFromNodeJs() {
std::array<std::future<int>, 16> futures;
puts("start");
// first pass
for (int i = 0; i != 16; ++i) {
futures[i] = std::async(std::launch::async, foo, i);
}
for (int i=0; i!=16; ++i) {
futures[i].wait();
}
puts("first pass complete");
// second pass
for (int i = 0; i != 16; ++i) {
futures[i] = std::async(std::launch::async, foo, i);
}
for (int i=0; i!=16; ++i) {
futures[i].wait();
}
puts("second pass complete");
}
let loader = require('./threadTest');
loader().then(m => {
m.ccall('functionCalledFromNodeJs','void',[],[]);
})
dan@dl360:~/applications/test-rgb2y$ em++ -O3 -s PROXY_TO_PTHREAD=1 -s PTHREAD_POOL_SIZE=16 -s USE_PTHREADS=1 -s ALLOW_MEMORY_GROWTH=1 -s ASSERTIONS=1 -s MODULARIZE=1 -msimd128 -mavx -s EXPORTED_FUNCTIONS='[_malloc, _free, _functionCalledFromNodeJs]' -s EXPORTED_RUNTIME_METHODS='[ccall, cwrap]' -s EXPORT_NAME=foobar -I. -o threadTest.js threadTest.cpp
em++: warning: USE_PTHREADS + ALLOW_MEMORY_GROWTH may run non-wasm code slowly, see https://github.com/WebAssembly/design/issues/1271 [-Wpthreads-mem-growth]
dan@dl360:~/applications/test-rgb2y$ /usr/bin/node --experimental-wasm-simd --experimental-wasm-threads --experimental-wasm-bulk-memory testrunner-thread.jsstart
first pass complete
[hangs...]
I don't think the main should make a difference - unless you are not waiting for the proper time to start up, perhaps? Using main will ensure you wait for the preparation stage to preload the workers. Might be worth debugging this with some prints to see what is different between the two cases (say, adding some printing in the library_phreads.js
after the pool is all ready, stuff like that).
Does my loader script (above) which waits for the promise to resolve before testing the library ensure the module is fully loaded and ready to go?
@omnisip it looks valid to me. But something must be going wrong somewhere, and that seems like a possible area. I think it just needs to be debugged as I said in my suggestions earlier.
Definitely different behavior between main() and another function call. It looks like the module is fully loaded before I call the library function. See behavior with -s PTHREADS_DEBUG below.
dan@dl360:~/applications/test-rgb2y$ em++ -O3 -pthread -s PTHREADS_DEBUG -s PROXY_TO_PTHREAD=1 -s PTHREAD_POOL_SIZE=16 -s USE_PTHREADS=1 -s ALLOW_MEMORY_GROWTH=1 -s ASSERTIONS=1 -s MODULARIZE=1 -msimd128 -mavx -s EXPORTED_FUNCTIONS='[_malloc, _free, _functionCalledFromNodeJs]' -s EXPORTED_RUNTIME_METHODS='[ccall, cwrap]' -s EXPORT_NAME=foobar -I. -o threadTest.js threadTest.cpp em++: warning: USE_PTHREADS + ALLOW_MEMORY_GROWTH may run non-wasm code slowly, see https://github.com/WebAssembly/design/issues/1271 [-Wpthreads-mem-growth]
dan@dl360:~/applications/test-rgb2y$ /usr/bin/node --experimental-wasm-threads --experimental-wasm-simd ./testrunner-thread.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
module loaded
start
first pass complete
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Here's the behavior if I add main and prevent my script from calling the library function (or main):
dan@dl360:~/applications/test-rgb2y$ em++ -O3 -pthread -s PTHREADS_DEBUG -s PROXY_TO_PTHREAD=1 -s PTHREAD_POOL_SIZE=16 -s USE_PTHREADS=1 -s ALLOW_MEMORY_GROWTH=1 -s ASSERTIONS=1 -s MODULARIZE=1 -msimd128 -mavx -s EXPORTED_FUNCTIONS='[_malloc, _free, _functionCalledFromNodeJs, _main]' -s EXPORTED_RUNTIME_METHODS='[ccall, cwrap]' -s EXPORT_NAME=foobar -I. -o threadTest.js threadTest.cpp
em++: warning: USE_PTHREADS + ALLOW_MEMORY_GROWTH may run non-wasm code slowly, see https://github.com/WebAssembly/design/issues/1271 [-Wpthreads-mem-growth]
dan@dl360:~/applications/test-rgb2y$ vim testrunner-thread.js
dan@dl360:~/applications/test-rgb2y$ /usr/bin/node --experimental-wasm-threads --experimental-wasm-simd ./testrunner-thread.jsAllocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
Allocating a new web worker from /home/dan/applications/test-rgb2y/threadTest.worker.js
module loaded
Main
start
Pthread 0xe02838 exited.
Pthread 0xc02490 exited.
Pthread 0x14033a0 exited.
Pthread 0x1202fc8 exited.
Pthread 0x1002c20 exited.
Pthread 0xc02488 exited.
Pthread 0x1803b20 exited.
Pthread 0x1603760 exited.
Pthread 0xe028b8 exited.
Pthread 0xe02a18 exited.
Pthread 0x1203390 exited.
Pthread 0xe02a18 exited.
Pthread 0xe02a18 exited.
Pthread 0x12037f8 exited.
Pthread 0xe02a18 exited.
Pthread 0xe02a18 exited.
first pass complete
Pthread 0xe02a18 exited.
Pthread 0x1203b30 exited.
Pthread 0xe02a18 exited.
Pthread 0xe02a18 exited.
Pthread 0x1203b30 exited.
Pthread 0xe02a18 exited.
Pthread 0xe02a18 exited.
Pthread 0x1203b30 exited.
Pthread 0xe02a18 exited.
Pthread 0xe02a18 exited.
Pthread 0x1203b30 exited.
Pthread 0xe02a18 exited.
Pthread 0xe028b8 exited.
Pthread 0xe02a18 exited.
Pthread 0xe028b8 exited.
Pthread 0xe028b8 exited.
second pass complete
Main End
Proxied main thread 0xa01e88 finished with return code 0. EXIT_RUNTIME=0 set, so keeping main thread alive for asynchronous event operations.
Pthread 0xa01e88 exited.
I think the problem here is that -s PROXY_TO_PTHREAD=1
does not work for ccall (or Embind) functions. You could workaround this by dispatching the call to the pthread running the original main()
on it, but this is quite hackish. See for example:
Compile with:
$ em++ -O3 -pthread -s TOTAL_MEMORY=128MB -s PTHREAD_POOL_SIZE=16 -s PROXY_TO_PTHREAD -s MODULARIZE -s EXPORTED_FUNCTIONS='[_main, _functionCalledFromNodeJs]' -s EXPORTED_RUNTIME_METHODS='[ccall]' -s EXPORT_NAME=foobar -o threadTest.js threadTest.cpp
Run with:
$ node --experimental-wasm-threads --experimental-wasm-bulk-memory testrunner.js
Unfortunately this makes the functionFromNodeJs
function asynchronous, which might not be desired.
@kripken is there any way to fix this without the workaround @kleisauke suggested?
@kleisauke what happens on subsequent calls to the thread function? The purpose of this is to be used as a node.js library with more than one call.
In that case, I think the above example could be simplified to this:
Compile with:
$ em++ -O3 -pthread -sTOTAL_MEMORY=128MB -sPTHREAD_POOL_SIZE=16 -sMODULARIZE -sEXPORTED_FUNCTIONS='[_functionCalledFromNodeJs]' -sEXPORTED_RUNTIME_METHODS='[ccall]' -sEXPORT_NAME=foobar -o threadTest.js threadTest.cpp
Run with:
$ node --experimental-wasm-threads --experimental-wasm-bulk-memory testrunner.js
The functionFromNodeJs
is then still asynchronous (due to that detached thread), but can be called multiple times without problems. Unfortunately, the above example prevents Node from exiting (-sEXIT_RUNTIME=1
won't work in this case) and users must call PThread.terminateAllThreads()
for tearing down all workers. I think this behavior is being investigated within https://github.com/emscripten-core/emscripten/issues/12801 (fwiw, I actually had a similar problem with a project I'm working on, see https://github.com/kleisauke/wasm-vips/commit/8f43b170602d1b026ff78a22d04159682ac82d5c).
Will investigate this week.
@omnisip Were you ever able to resolve this ?
I am running into a similar issue. I have a native library that i am using for calculations, but in my situation i am waiting for a result from the function call. There are a few options i tried to approach the problem. It seems like emscripten could supply better support for std::future objects but yielding the message loop. I modified the example to defer the function that does threading to another thread and then wait for the future to be ready to avoid blocking the main thread which causes the thread exhaustion.
My resuls are interesting.
testRunner.js
require('./threadTest')().then(async m => console.log(`Results is ${await m._functionCalledFromNodeJs()}`));
build and run
em++ -O3 -pthread -sTOTAL_MEMORY=128MB -sMODULARIZE -sASYNCIFY -sEXPORTED_FUNCTIONS='[_functionCalledFromNodeJs]' -sEXPORT_NAME=foobar -o threadTest.js threadTest.cpp
node --experimental-wasm-threads --experimental-wasm-bulk-memory testrunner.js
Here is the output
Status: 1
Results is 0
start
Status: 1
Status: 1
Status: 1
Status: 1
first pass complete
second pass complete
Status: 0
What is interesting is functionCalledFromNodeJs
seems to return immediately and everything seems to complete correctly. Then the application hangs and never exits. Same results with node 14,16,17
@sbc100 @kleisauke @kripken Any insight here? I wonder if emscripten could have better built in support for wiring std::future either with a Promise object or implicitly use emscripten_sleep with the get/wait/wait_for/wait_until
functions.
I am using embind, it would be very coinvent if i was able to just return std::futurefunction("functionCalledFromNodeJs", optional_override([]() { std::async(std::launch::async, &thread_function) }));
asyncFunction("functionCalledFromNodeJs", &thread_function);
Ok quick update if used with embind it seems to work better
adding
EMSCRIPTEN_BINDINGS(example) {
emscripten::function("bindFunctionCalledFromNodeJs",
&functionCalledFromNodeJs);
}
and calling
console.log(`Results is ${await m.bindFunctionCalledFromNodeJs()}`));
Does produce expected result
start
Status: 1
Status: 1
Status: 1
Status: 1
first pass complete
second pass complete
Status: 0
Results is 100
To answer the first post, i guess the return type of int
won't allow a Promise to return.
I do think better support for std::future could be added to emscripten as well as a possible asyncFunction embind template
Here is a rough crack it without the need for ASYNCIFY
testRunner.js
require('./threadTest')().then(async m => console.log(`Results is ${await m.bindFunctionCalledFromNodeJs()}`));
build and run
em++ -O3 -pthread -sTOTAL_MEMORY=128MB -lembind -sMODULARIZE -sEXPORT_NAME=foobar -o threadTest.js threadTest.cpp
node --experimental-wasm-threads --experimental-wasm-bulk-memory testrunner.js
Here is the output
start
first pass complete
second pass complete
Results is 100
@sbc100 @kripken Has there been any discussion around adding std::future
to Promise
support to embind ?
Hmm, I don't remember hearing a discussion about std::future/Promise
integration.
Hi there,
I'm totally at a loss on what to expect for pthread behavior with my application. I'm using emscripten to implement a wasm module that acts as a portable SIMD library for performing operations. The library calls from Node.JS are designed to be synchronous and leverage memory passing. Inside the library, I'm spawning threads to expedite the operations (performing prefix sums and converting RGB image pixels to YUV). The threads wouldn't even start at the beginning before PTHREAD_POOL_SIZE. Now that I've added it, it works up until N threads have been consumed. PROXY_TO_PTHREAD doesn't seem to do anything meaningful at least not with respect to my application and Node.JS.
I figure if a pool is needed, why not just treat it like a pool instead of exhausting it? On the other hand, what are my alternatives if this can't get resolved? I need to be able to call rgb2y more than one time.