emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.35k stars 3.25k forks source link

pthread workers 100% CPU usage... #22128

Open Honya2000 opened 1 week ago

Honya2000 commented 1 week ago

Hello,

Today I profiled my wasm engine (pthreads version) and found that all 4 threads (workers) of the job system take 100% CPU usage. I used chrome profiler. Normally it shows nothing in workers (white bars) if thread is suspended. But in my case it shows usage in... emscripten_futex_wait. Isn't it suppose to suspend thread (sleep) in this function? Why it shows hundreds of millisecond CPU usage in waiting for conditional variable?

pthreads

sbc100 commented 1 week ago

Can you share the full set of link flags you use?

Do you know when this changed? Certainly this sounds like a regression. You can bisect on emsdk versions to find out what this broke https://emscripten.org/docs/contributing/developers_guide.html#bisecting (or even just tell is the version where it was not doing this and the first version where you noticed it?)

Honya2000 commented 1 week ago

I'm not exactly sure when this start happening, because normally i do not check workers in profiler. I think 3 month ago workers looked good in profiler, in idle state most of the time. Yesterday i used emsdk 3.1.59, and found this weird issue in profiler. Then i pulled new version 3.1.61. And it didn't solve the situation. Will try to rollback to some emsdk i used 3-4 month ago and check.

This is the list of flags in build.ninja:

FLAGS = -flto -frtti -DNDEBUG -s USE_SDL=2 -pthread -s USE_PTHREADS=1 -s PTHREAD_POOL_SIZE=4 -O2 -gsource-map -Wno-unknown-pragmas -Wno-unused-function -Wno-deprecated-declarations -fvisibility=hidden -DTMRW_DEBUG=true -DTMU_PLUGINMANAGER_NO_DYNAMIC_PLUGIN_SUPPORT=1 -fcolor-diagnostics -DNDEBUG -fomit-frame-pointer -ffunction-sections -fdata-sections

LINK_FLAGS = -s MODULARIZE -s EXPORT_NAME=createWasmEngine --bind -s WASM=1 -s ALLOW_MEMORY_GROWTH=1 -frtti -s USE_SDL=2 -s MIN_WEBGL_VERSION=2 -s MAX_WEBGL_VERSION=2 -s USE_WEBGL2=1 -s TEXTDECODER=0 -s ALLOW_BLOCKING_ON_MAIN_THREAD=1 -s ENVIRONMENT=web,worker -pthread -s USE_PTHREADS=1 -s PTHREAD_POOL_SIZE=4 -O2 -gsource-map -s EXPORTED_FUNCTIONS=['_main','_malloc','_free','_onLoadURL'] -s EXPORTED_RUNTIME_METHODS=['ccall','cwrap','AsciiToString','allocate'] -s EXTRA_EXPORTED_RUNTIME_METHODS=['GL'] -Wl,-u,_emscripten_run_callback_on_thread -s USE_WEBGL2=1

Honya2000 commented 1 week ago

The earliest emsdk i could rollback is 3.1.52. And the issue with busy workers still exists there. When i tried to rollbcak to 3.1.50 - engine crashed. Looks like emscripten modules instantiation was changed that time, so i modified javascript wasm engine loader as well.

For now will have to temporariry disable pthreads engine... Any ideas why emscripten_futex_wait wastes CPU time ?

sbc100 commented 1 week ago

On the main thread emscripten_futex_wait will busy wait in a tight loop.

See: https://github.com/emscripten-core/emscripten/blob/eb47e28bd6dbbd44292f02cd096ed96401d83e63/system/lib/pthread/emscripten_futex_wait.c#L129-L133

On worker threads emscripten_futex_wait will actually block and not busy wait.

See: https://github.com/emscripten-core/emscripten/blob/eb47e28bd6dbbd44292f02cd096ed96401d83e63/system/lib/pthread/emscripten_futex_wait.c#L155

However, higher level API calls such a pthread_cond_wait will slice up the wait time and only wait for at most 100ms at a time. See https://github.com/emscripten-core/emscripten/blob/eb47e28bd6dbbd44292f02cd096ed96401d83e63/system/lib/libc/musl/src/thread/__timedwait.c#L64-L65. However I would hope that waking up every 100ms does not constitute a lot of CPU activity and shouldn't show up in the profile. The reason we wake up ever 100ms is to allow for the worker proxying and pthread semantics such as cancellation.

Honya2000 commented 1 week ago

I'm not using wait in main thread. Please check the screenshots above. I'm talking about nearly 100% cpu usage in pthread function loop (ie Worker) where i use function: std::condition_variable.wait(std::unique_lock < std::mutex > &); to wait for the commands... which internally uses busy loop in emscripten_futex_wait.

sbc100 commented 1 week ago

Can you point the where you think the busy loop is?

emscripten_futex_wait itself does not contain a busy loop (except when called from the main thread).

The calling function __timedwat_cp loops ever 100ms. So I'm a little confused about where the busy loop might be?

Honya2000 commented 1 week ago

Ok, can you please explain profiler screenshot I showed in first post ? I dodn't know where is busy loop, because i'm not autor of emscripten code. I just wondering why worker code is busy 100% of the time when it should idle most of the time.

sbc100 commented 1 week ago

I can't explain that no, those threads should only be waking up every 100ms, not busy-waiting.

Honya2000 commented 1 week ago

This is screenshot i took from pure java script code, where I'm using workers: Correct Here is clear, it is idles most of the time.

And this is screenshot from wasm engine with ptrhread workers: pthreads As you can see it is busy constantly.. inside futex_wait.

sbc100 commented 1 week ago

I understand that you are seeing a busywaiting thread.. I just can't explain it at this point.

As far as we know the behaviour your are seeing has always been there, is that right? i.e. you can't find a version of emscripten that doesn't show this behaviour.

If the problem is that the very low level in emscripten_futex_wait it should be possible to also reproduce this using a simple C program that just called calls pthread_mutex_lock from the main thread and then starts a worker that simply calls pthread_mutex_lock, I think?

What is the UI you are using to grab those screenshots? I assume that is one part of chrome?

Honya2000 commented 1 week ago

As I mentioned earlier, several month ago i didn't see busy workers in my project. It was the opposite, threads where suspended, and I always tried to find the small piece of code executed in workers (i'm talking about chrome profiler here).

But i don't know when it was changed. Moreover I don't know is this issue related to emscripten or there is bug in latest chrome profiler. But i assume if there is bug in profiler - why it shows correct expected result in pure js project workers? So I still think the issue is related to emsdk somehow...

Honya2000 commented 5 days ago

Ok, we checked the pthread workers on 3 different dev machines in chrome profiler. All dev PC's show the same behaviour with busy threads!