Closed ablakey closed 3 years ago
Yes, someone mentioned it to me some time ago on Twitter and I raised a issue on Chromium repo as it appears that's where the cause is. https://bugs.chromium.org/p/chromium/issues/detail?id=1228686&q=reporter%3Arreverser%40google.com&can=1
Feel free to add any details you have there (or subscribe otherwise).
Any updates on the chromium issue? The link provided gives me a permission denied, and I still notice the same problem in my web app. @RReverser
Oh I didn't realise it wasn't public.
There has been some progress recently. The current theory is - it appears it's not as much of a bug, but more of a scheduling problem caused by how M1 works. It has 4 cores that are perf cores, while the other 4 are efficiency cores.
navigator.hardwareConcurrency
counts all of them and returns 8
, but then splitting work equally between those 8 cores results in worse performance than with single thread, because 4 energy-saving cores drag everything down.
Changing number of threads (configurable via initThreadPool
) manually to 4
on M1 seems to provide best performance (better than single-threaded and better than 8-threaded).
Unfortunately, I don't have M1 to test this further on and mainly relying on other V8 engineers to dig into that, but you can try this approach in your web app and let me know if it helps.
UPD: I made the issue public.
Thanks, I appreciate it! I see quite a bit of activity on the issue, hope it gets resolved soon.
As I said, it would be also helpful if you tried initializing the thread pool with just 4 threads for now and reporting back if it improves performance in your app.
The more data points, the better :)
Except for the 4 threads problem, it seems like WASM atomics is much slower on M1 compared to my other machines. In the native environment using 8 threads instead of 4 threads won't hurt so much. For jobs that can be paralleled well, it can still improve a little.
Here is a WASM benchmark of a repo I'm working on
Thread Num | MacBook Pro (13-inch, M1, 2020) | AMD Ryzen 9 3900X |
---|---|---|
1 | 24ms | 14ms |
4 | 28400ms | 24ms |
Please report any issues with atomics on M1 either to the same linked Chromium bug above, or create another one if problems are sufficiently different.
Either way, unfortunately, it's not something I can fix or control from this library.
Ya, I know you can't fix this in this lib. Just thought you worked closely with the guys in Chromium. Thanks, I reported it to the Chromium.
Ya, I know you can't fix this in this lib. Just thought you worked closely with the guys in Chromium. Thanks, I reported it to the Chromium.
Used to work a little more closely, but no longer at Google. Plus, it's still easier to track those things on Chromium issue tracker, rather than ask them to look at random Github repo like this one :)
FWIW I'm happy to report this is finally fixed in Chrome 112.
I'm running an M1 Mac with Chrome (Version 92.0.4515.159 (Official Build) (arm64)) and find that both the hosted demo (https://rreverser.com/wasm-bindgen-rayon-demo/) and a locally compiled version do not behave the way I anticipated they would.
single thread: 275ms all threads: 3000-7000ms
I was expecting the demo to demonstrate the threading capability both by it working and by it being meaningfully faster. If my expectations are misguided, apologies.
I asked others to try. Non M1 Macs are getting results that are 2-6x faster. This smells like the compiled demo is not behaving properly with M1 architecture.
Is there anything else I can share that might help better understand this issue?
Thank you!