Open mere-human opened 8 months ago
Thanks for the report. The best way to proceed with this kind of regression is to try to bisect to smaller range and see if we can narrow it down to a specific change. Can you take a look at the bisect instruction here: https://emscripten.org/docs/contributing/developers_guide.html#bisecting
I think the next step is to bisect to find out which binaryen change (or llvm change) triggered your regression.
@tlively and @kripken maybe be able to hazard a guess given to revision range, but bisecting is always extremely useful.
Another profiling tool you can use for Binaryen specifically is pass timings. Putting BINARYEN_PASS_DEBUG=1
in the env will emit stuff like this:
[PassRunner] running passes
[PassRunner] running pass: duplicate-function-elimination... 0.00613846 seconds.
[PassRunner] (validating)
[PassRunner] running pass: memory-packing... 0.0124154 seconds.
[PassRunner] (validating)
[PassRunner] running pass: dce... 0.106082 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 0.0273986 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-brs... 0.111207 seconds.
[PassRunner] (validating)
[PassRunner] running pass: remove-unused-names... 0.0274284 seconds.
[PassRunner] (validating)
[PassRunner] running pass: optimize-instructions... 0.129891 seconds.
[PassRunner] (validating)
[PassRunner] running pass: precompute... 0.0737234 seconds.
[PassRunner] (validating)
[PassRunner] running pass: tuple-optimization... 0.000672911 seconds.
[PassRunner] (validating)
[PassRunner] running pass: simplify-locals-nostructure... 0.21312 seconds.
That is, it prints the time each pass takes. (It also validates, which makes the build slower, but doesn't interfere with the timing measurement.)
@sbc100 I think I found a commit using instructions from the doc. I hope I did everything right. It seems to be a part of 3.1.50 version.
git bisect start
# good: [694434b6d47c5f6eff2c8fbd9eeb016c977ae9dc] 3.1.48 LTO RC
...
# bad: [2ce4170cef5ce46f337f9fd907b614a8db772c7d] 3.1.50 LTO RC
...
1485e96d7c269709e25810e57ea368c9af7993d6 is the first bad commit
commit 1485e96d7c269709e25810e57ea368c9af7993d6
Author: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Date: Tue Nov 28 23:38:14 2023 +0000
Roll binaryen from 2ea76680ae72 to dbcac17d645d (1 revision)
https://chromium.googlesource.com/external/github.com/WebAssembly/binaryen.git/+log/2ea76680ae72..dbcac17d645d
2023-11-28 azakai@google.com [NFC] Move InstrumentedPass logic out and use it in another place (#6132)
If this roll has caused a breakage, revert this CL and stop the roller
using the controls here:
https://autoroll.skia.org/r/binaryen-emscripten-releases
Please CC dschuff@google.com,wasm-waterfall@grotations.appspotmail.com on the revert to ensure that a human
is aware of the problem.
To report a problem with the AutoRoller itself, please file a bug:
https://issues.skia.org/issues/new?component=1389291&template=1850622
Documentation for the AutoRoller is here:
https://skia.googlesource.com/buildbot/+doc/main/autoroll/README.md
Tbr: wasm-waterfall@grotations.appspotmail.com
Change-Id: Ib39a12223ff7d03e0ed5b9da12e39618c2c38e39
Reviewed-on: https://chromium-review.googlesource.com/c/emscripten-releases/+/5069256
Commit-Queue: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
Bot-Commit: chromium-autoroll <chromium-autoroll@skia-public.iam.gserviceaccount.com>
DEPS | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Below are the results for one of the targets (test UI app with EMSDK 3.1.54) where we see the slowdown. wasm-ld command line (modified to exclude project names):
shared:DEBUG: successfully executed C:/Users/user/emsdk/3.1.54/upstream/bin\wasm-ld.exe -o Release/1/App.wasm -LRelease/1 --error-limit=0 obj/Apps/App/App/entry.o obj/obj1.o --whole-archive obj/lib1.a obj/lib2.a obj/lib3.a --no-whole-archive .... --no-whole-archive ../../ThirdParty/Google/Skia/libs/wasm/libskia.a ../../ThirdParty/Google/Skia/libs/wasm/libskparagraph.a ../../ThirdParty/Google/Skia/libs/wasm/libskshaper.a ../../ThirdParty/Google/Skia/libs/wasm/libskunicode.a ../../ThirdParty/Apache/Xerces/lib/wasm/libxerces-c.a ../../ThirdParty/ICU/lib/wasm/libicudata.a ../../ThirdParty/ICU/lib/wasm/libicui18n.a ../../ThirdParty/ICU/lib/wasm/libicuio.a ../../ThirdParty/ICU/lib/wasm/libicutu.a ../../ThirdParty/ICU/lib/wasm/libicuuc.a ../../ThirdParty/Harfbuzz/lib/wasm/libharfbuzz.a ../../ThirdParty/OpenCVOrg/OpenCV/lib/wasm/libopencv_calib3d.a ../../ThirdParty/OpenCVOrg/OpenCV/lib/wasm/libopencv_core.a ../../ThirdParty/OpenCVOrg/OpenCV/lib/wasm/libopencv_dnn.a ../../ThirdParty/OpenCVOrg/OpenCV/lib/wasm/libopencv_features2d.a ../../ThirdParty/OpenCVOrg/OpenCV/lib/wasm/libopencv_flann.a ../../ThirdParty/OpenCVOrg/OpenCV/lib/wasm/libopencv_imgproc.a ../../ThirdParty/OpenCVOrg/OpenCV/lib/wasm/libopencv_objdetect.a ../../ThirdParty/OpenCVOrg/OpenCV/lib/wasm/libopencv_photo.a ../../ThirdParty/OpenCVOrg/OpenCV/lib/wasm/libopencv_video.a -LC:\Users\user\emsdk\3.1.54\upstream\emscripten\cache\sysroot\lib\wasm32-emscripten C:\Users\user\emsdk\3.1.54\upstream\emscripten\cache\sysroot\lib\wasm32-emscripten\crtbegin.o C:\Users\user\emsdk\3.1.54\upstream\emscripten\cache\sysroot\lib\wasm32-emscripten\crt1_proxy_main.o --whole-archive -lfetch-mt -lembind-rtti --no-whole-archive -lGL-mt-webgl2-getprocaddr -lal -lhtml5 -lbulkmemory -lstubs -lnoexit -lc-mt -ldlmalloc-mt -lcompiler_rt-mt -lc++-mt -lc++abi-mt -lsockets-mt -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-cxx-exceptions -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr C:\Users\user\AppData\Local\Temp\emscripten_temp\tmpisj41kgwlibemscripten_js_symbols.so --import-memory --shared-memory --export=htons --export=ntohs --export=_emscripten_thread_init --export=_emscripten_thread_exit --export=_emscripten_thread_crashed --export=_emscripten_tls_init --export=pthread_self --export=_embind_initialize_bindings --export=stackAlloc --export=stackSave --export=stackRestore --export=_emscripten_thread_free_data --export=emscripten_main_runtime_thread_id --export=emscripten_main_thread_process_queued_calls --export=_emscripten_run_on_main_thread_js --export=emscripten_stack_set_limits --export=__get_temp_ret --export=__set_temp_ret --export=__cxa_is_pointer_type --export=__cxa_can_catch --export=__cxa_increment_exception_refcount --export=__cxa_decrement_exception_refcount --export=setThrew --export=__cxa_free_exception --export=__wasm_call_ctors --export=_emscripten_thread_init --export=_emscripten_thread_exit --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=main --export-if-defined=__main_argc_argv --export-table -z stack-size=5242880 --initial-memory=4294901760 --max-memory=4294901760 --entry=_emscripten_proxy_main --table-base=1 --global-base=1024
4 obj.o files 86 static lib.a files
Binary sizes:
Below is the linker output with BINARYEN_PASS_DEBUG=1
(same test UI target with EMSDK 3.1.54). Most notable phases are:
Comparing this to the output for the same target but with EMSDK 3.1.37, I can see that "dae-optimizing" takes almost x10 longer. "simplify-locals" increased as well.
@mere-human can you provide the wasm file to reproduce this? We did measurements before landing that, and I did some more measurements now on wasm files I have locally, and can't reproduce the problem. It might be specific to that wasm somehow.
If you can't provide it, what I would do to investigate this on your machine is to run --dae-optimizing --optimize-level=3
on that wasm (that is the main regression here; simplify-locals looks like noise), with some profiling. E.g. on linux I would use perf stat -r 1
for CPU metrics, and perf record
/ perf report
for profiling.
We did measurements before landing that, and I did some more measurements now on wasm files I have locally, and can't reproduce the problem.
@kripken Thanks, I'll try that. But did you check it on Windows? I see the problem only on Windows but not on Mac (where the build time has actually improved). Also, I'm not sure I can use "perf" on Windows.
I've just compared .wasm output for each phase with EMCC_DEBUG=1
produced on Mac and on Windows. All the binaries are identical. But I see the slowdown only on Windows.
Oh, sorry, I missed that this was only on Windows. I don't have a Windows machine myself - I tested on Linux.
In that case, maybe indirect calls are slower there for some reason? That is the only thing that PR should change, I think. I'm not sure how the OS can matter for indirect call speed, though I recall Spectre had some mitigations related to that...
Perhaps there is some Windows profiling tool that you can run to get some insights on what is happening here?
I work on a large project that is built with Emscripten. When upgrading EMSDK from version 3.1.37 to 3.1.50, I noticed an increase in link time for a release build. The slowdown occurs only when building on Windows, but not on Mac. The link time remains relatively the same in version 3.1.54.
Link time for the main large UI target:
For another smaller test UI target, the link time between .37 and .50 went from 2:43m to 3:22m on Windows. As it can be seen, the slowdown scales up. However, for small CLI targets such as unit tests, the link time has actually improved. The project contains a lot of unused code that is eliminated during the wasm-opt phase. The amount of code as well as linker flags has remained the same between EMSDK versions.
What could be the reason for this, and are there any ways to solve it? I suspect it could be due to LLVM 17 update that happened in 3.1.50. I can provide more details if needed.
Version of emscripten/emsdk:
Failing command line in full: Link command for the smaller program below.
Full link command and output with
-v
appended:Linker args were simplified to exclude unnecessary file names. Linker args are specified through an input file. Args:
em++.bat -L Libs -sEXPORTED_FUNCTIONS=_main -sUSE_WEBGL2=1 --bind -sUSE_PTHREADS=1 -sCASE_INSENSITIVE_FS=1 -O2 -sPTHREAD_POOL_SIZE=6 -Wl,--error-limit=0 -sEXPORTED_RUNTIME_METHODS=ccall,cwrap -sPTHREAD_POOL_SIZE_STRICT=2 -sDISABLE_EXCEPTION_CATCHING=0 -sLLD_REPORT_UNDEFINED -sTOTAL_MEMORY=2147418112 --profiling-funcs --bind -sUSE_WEBGL2=1 -sFETCH=1 -sSTACK_SIZE=5MB -sEXPORTED_FUNCTIONS=_main,_htons,_ntohs -sEXPORTED_RUNTIME_METHODS=ccall,cwrap -sPROXY_TO_PTHREAD=1 --pre-js=script1.js --pre-js=script2.js ... -o app.html source1.o lib1.a ... --js-library lib1.js ...
Some of the static libs use "-Wl,--whole-archive" or "-Wl,--no-whole-archive".
Linker output: Output for the smaller program.
Click to expand
I modified some file names. There were 4 .o files and 86 .a files. ``` "C:/Users/user/emsdk/3.1.54/upstream/bin\clang.exe" --version cache:INFO: generating system asset: symbol_lists/bfb8fc0f93e874314fc7c836514c2df47f9c2e3c.json... (this will be cached in "C:\Users\user\emsdk\3.1.54\upstream\emscripten\cache\symbol_lists\bfb8fc0f93e874314fc7c836514c2df47f9c2e3c.json" for subsequent builds) "C:/Users/user/emsdk/3.1.54/node/16.20.0_64bit/bin/node.exe" C:\Users\user\emsdk\3.1.54\upstream\emscripten\src\compiler.mjs C:\Users\user\AppData\Local\Temp\tmp2zp4ahks.json --symbols-only cache:INFO: - ok "C:/Users/user/emsdk/3.1.54/upstream/bin\wasm-ld.exe" -o Release/App.wasm -LRelease/ --error-limit=0 obj/Apps/App/entry.o ... --whole-archive obj/Apps/App/AppCore.a ... --no-whole-archive obj/ThirdParty/zlib/zlib/zlib.a --whole-archive ... -LC:\Users\user\emsdk\3.1.54\upstream\emscripten\cache\sysroot\lib\wasm32-emscripten C:\Users\user\emsdk\3.1.54\upstream\emscripten\cache\sysroot\lib\wasm32-emscripten\crtbegin.o C:\Users\user\emsdk\3.1.54\upstream\emscripten\cache\sysroot\lib\wasm32-emscripten\crt1_proxy_main.o --whole-archive -lfetch-mt -lembind-rtti --no-whole-archive -lGL-mt-webgl2-getprocaddr -lal -lhtml5 -lbulkmemory -lstubs -lnoexit -lc-mt -ldlmalloc-mt -lcompiler_rt-mt -lc++-mt -lc++abi-mt -lsockets-mt -mllvm -combiner-global-alias-analysis=false -mllvm -enable-emscripten-cxx-exceptions -mllvm -enable-emscripten-sjlj -mllvm -disable-lsr C:\Users\user\AppData\Local\Temp\[tmpn2_uk2smlibemscripten_js_symbols.so](http://tmpn2_uk2smlibemscripten_js_symbols.so/) --import-memory --shared-memory --export=htons --export=ntohs --export=_emscripten_thread_init --export=_emscripten_thread_exit --export=_emscripten_thread_crashed --export=_emscripten_tls_init --export=pthread_self --export=_embind_initialize_bindings --export=stackAlloc --export=stackSave --export=stackRestore --export=_emscripten_thread_free_data --export=emscripten_main_runtime_thread_id --export=emscripten_main_thread_process_queued_calls --export=_emscripten_run_on_main_thread_js --export=emscripten_stack_set_limits --export=__get_temp_ret --export=__set_temp_ret --export=__cxa_is_pointer_type --export=__cxa_can_catch --export=__cxa_increment_exception_refcount --export=__cxa_decrement_exception_refcount --export=setThrew --export=__cxa_free_exception --export=__wasm_call_ctors --export=_emscripten_thread_init --export=_emscripten_thread_exit --export-if-defined=__start_em_asm --export-if-defined=__stop_em_asm --export-if-defined=__start_em_lib_deps --export-if-defined=__stop_em_lib_deps --export-if-defined=__start_em_js --export-if-defined=__stop_em_js --export-if-defined=main --export-if-defined=__main_argc_argv --export-table -z stack-size=5242880 --initial-memory=4294901760 --max-memory=4294901760 --entry=_emscripten_proxy_main --table-base=1 --global-base=1024 "C:/Users/user/emsdk/3.1.54/upstream/bin\llvm-objcopy.exe" Release/App.wasm Release/App.wasm --remove-section=.debug* --remove-section=producers "C:/Users/user/emsdk/3.1.54/upstream\bin\wasm-emscripten-finalize" -g --dyncalls-i64 --pass-arg=legalize-js-interface-exported-helpers Release/App.wasm -o Release/App.wasm --detect-features "C:/Users/user/emsdk/3.1.54/node/16.20.0_64bit/bin/node.exe" C:\Users\user\emsdk\3.1.54\upstream\emscripten\src\compiler.mjs C:\Users\user\AppData\Local\Temp\tmpeng7appy.json "C:/Users/user/emsdk/3.1.54/upstream\bin\wasm-opt" --strip-target-features --post-emscripten -O2 --low-memory-unused --zero-filled-memory --pass-arg=directize-initial-contents-immutable Release/App.wasm -o Release/App.wasm -g --mvp-features --enable-threads --enable-bulk-memory --enable-mutable-globals --enable-sign-ext --enable-simd "C:/Users/user/emsdk/3.1.54/node/16.20.0_64bit/bin/node.exe" C:\Users\user\emsdk\3.1.54\upstream\emscripten\tools\acorn-optimizer.mjs C:\Users\user\AppData\Local\Temp\emscripten_temp_4rvly0q9\App.js unsignPointers -o C:\Users\user\AppData\Local\Temp\emscripten_temp_4rvly0q9\App.jso1.js "C:/Users/user/emsdk/3.1.54/node/16.20.0_64bit/bin/node.exe" C:\Users\user\emsdk\3.1.54\upstream\emscripten\tools\acorn-optimizer.mjs C:\Users\user\AppData\Local\Temp\emscripten_temp_4rvly0q9\App.jso1.js JSDCE minifyWhitespace -o C:\Users\user\AppData\Local\Temp\emscripten_temp_4rvly0q9\App.jso2.js "C:/Users/user/emsdk/3.1.54/node/16.20.0_64bit/bin/node.exe" C:\Users\user\emsdk\3.1.54\upstream\emscripten\tools\preprocessor.mjs C:\Users\user\AppData\Local\Temp\emscripten_temp_4rvly0q9\settings.js worker.js --expandMacros "C:/Users/user/emsdk/3.1.54/node/16.20.0_64bit/bin/node.exe" C:\Users\user\emsdk\3.1.54\upstream\emscripten\tools\acorn-optimizer.mjs C:\Users\user\repo\source\out\PROD-WASM-EMSCRIPTEN\Release\Programs\App.worker.js minifyWhitespace "C:/Users/user/emsdk/3.1.54/node/16.20.0_64bit/bin/node.exe" C:\Users\user\emsdk\3.1.54\upstream\emscripten\tools\preprocessor.mjs C:\Users\user\AppData\Local\Temp\emscripten_temp_4rvly0q9\settings.js Shell.html "C:\Users\user\emsdk\3.1.54\upstream\emscripten\node_modules\.bin\html-minifier-terser.cmd" Release/App.html -o Release/App.html --collapse-whitespace --collapse-inline-tag-whitespace --remove-comments --remove-tag-whitespace --sort-attributes --sort-class-name --decode-entities --collapse-boolean-attributes --remove-attribute-quotes --remove-redundant-attributes --remove-script-type-attributes --remove-style-link-type-attributes --use-short-doctype --minify-css true --minify-js true ```When building with
EMCC_DEBUG=1
, I can see that most of the link time is spent in wasm-opt & binariyen. Although, without much details. Example log entries (for a smaller program):