Closed niklasf closed 4 years ago
Much better with 1.39.1-fastcomp:
132K stockfish.js
40K stockfish.js.mem
392K stockfish.wasm
12K stockfish.worker.js
576K total
Can you post the linker command line options you use?
There are some tools for analyzing the size of WebAssembly binaries such as https://github.com/rustwasm/twiggy. In this case the size it quite significant so I suspect that something is preventing the GC of dead function in llvm upstream case. Seeing you command line options will help figure it out.
On Mon, Nov 4, 2019 at 6:14 AM Niklas Fiekas notifications@github.com wrote:
Much better with 1.39.1-fastcomp:
132K stockfish.js 40K stockfish.js.mem 392K stockfish.wasm 12K stockfish.worker.js 576K total
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.
Thanks. The Makefile linked above runs the following command for the source files:
em++ -Wall -Wcast-qual -std=c++11 -s MODULARIZE=1 -s EXPORT_NAME="Stockfish" -s NO_EXIT_RUNTIME=1 -s USE_PTHREADS=1 -s PTHREAD_POOL_SIZE=1 -s TOTAL_MEMORY=109051904 -s EXPORTED_FUNCTIONS="['_main', '_uci_command']" -s "EXTRA_EXPORTED_RUNTIME_METHODS=['ccall']" -s "INCOMING_MODULE_JS_API=['print', 'postRun']" --pre-js pre.js -DNDEBUG -O3 -DIS_64BIT -DNO_PREFETCH -msse3 -mpopcnt -DUSE_POPCNT -flto -c -o main.o main.cpp
The final binary is linked with basically the same flags:
em++ -o stockfish.js benchmark.o bitbase.o bitboard.o endgame.o evaluate.o main.o material.o misc.o movegen.o movepick.o pawns.o position.o psqt.o search.o thread.o timeman.o tt.o uci.o ucioption.o -s MODULARIZE=1 -s EXPORT_NAME="Stockfish" -s NO_EXIT_RUNTIME=1 -s USE_PTHREADS=1 -s PTHREAD_POOL_SIZE=1 -s TOTAL_MEMORY=109051904 -s EXPORTED_FUNCTIONS="['_main', '_uci_command']" -s "EXTRA_EXPORTED_RUNTIME_METHODS=['ccall']" -s "INCOMING_MODULE_JS_API=['print', 'postRun']" --pre-js pre.js -lpthread -Wall -Wcast-qual -std=c++11 -s MODULARIZE=1 -s EXPORT_NAME="Stockfish" -s NO_EXIT_RUNTIME=1 -s USE_PTHREADS=1 -s PTHREAD_POOL_SIZE=1 -s TOTAL_MEMORY=109051904 -s EXPORTED_FUNCTIONS="['_main', '_uci_command']" -s "EXTRA_EXPORTED_RUNTIME_METHODS=['ccall']" -s "INCOMING_MODULE_JS_API=['print', 'postRun']" --pre-js pre.js -DNDEBUG -O3 -DIS_64BIT -DNO_PREFETCH -msse3 -mpopcnt -DUSE_POPCNT -flto
I'll try twiggy now.
Unfortunately twiggy panics on the version compiled with the llvm backend:
$ twiggy garbage 1.39.1-llvm-stockfish.wasm
thread 'main' panicked at 'should not parse the same key into multiple items', /home/niklas/.cargo/registry/src/github.com-1ecc6299db9ec823/twiggy-ir-0.6.0/./ir.rs:53:9
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
Binaries attached, gzipped to be able to upload on GitHub:
1.39.1-fastcomp-stockfish.wasm.gz 1.39.1-llvm-stockfish.wasm.gz
Bloaty might be helpful too. Or just running wasm-dis and running emscripten's tools/find_bigfuncs.py
.
It's possible the newer llvm in upstream inlines differently, which affects code size, but a difference that big sounds like something else. Should be easy to debug though given it's so big.
How much do things change when removing -flto
from all the commands?
I noticed something else while looking at the gzipped files:
136K 1.39.1-fastcomp-stockfish.wasm.gz
128K 1.39.1-llvm-stockfish.wasm.gz
So the difference goes away after compression. In fact LLVM is now better!
Will run bloaty and try removing -flto soon.
Any chance you can upload builds with names (--profiling-funcs
at link)? That would make comparing them easier.
edit: nevermind :) see next comment
Comparing the existing builds, wasm-opt --metrics
shows basically the entire difference is
- [memory-data] : 0
+ [memory-data] : 1013251
So the fastcomp build has no memory in the wasm, but the upstream one does.
I believe that makes sense, as in upstream the data is in the wasm, and we use passive segments so load them only in the main thread. In fastcomp, we separate them out, and load them once in JS.
What is odd is that you wrote you have a 40K .mem
file in both cases. Did the upstream build not emit one, but there was one already in that dir perhaps?
I see there are many many zeros in the upstream build's data segments. I guess we don't do the usual tricks to skip zeros in that case as they are passive @tlively ?
Thank you very much, this explains the difference! I can confirm that it's just the .mem file from the previous build.
The MemoryPacking pass currently bails out in the presence of memory.init
and data.drop
segments, so zeros in non-bss segments will not be eliminated with pthread builds at the moment. This could be fixed by duplicating and updating bulk memory operations when the segments they reference are split or otherwise changed.
I've noted this on my backlog, but I'm not sure how to prioritize it. @kripken how critical do you think this is?
Good question. If this were for non-pthreads builds too I'd be much more concerned. As it's pthreads-only, it's still important but less I guess. I put it in the middle column on https://github.com/emscripten-core/emscripten/projects/1 now fwiw.
same problem , when switch 1.39.0-fastcomp to 1.39.0-upstream, wasm size(ffmpeg libx) from 1.1M to 2.1M, use thread also
The fix for this has been merged upstream in https://github.com/WebAssembly/binaryen/pull/2426 and should make its way into the next release of Emscripten (1.39.7).
fwiw, on a large project (with pthreads) this reduced my .wasm
from 4.66 MB to 2.89 MB (!).
But running this in the browser results in an error:
CompileError: WebAssembly.instantiate(): Compiling function #5031 failed: invalid data segment index: 4294967232 @+1726667
Has this error already been reported? If not, I'm happy to open a new issue with the relevant .wasm
from v1.39.6 and master (I could also provide a minimal test case, if you want, but that would take a bit longer).
I'm testing with sdk-upstream-master-64bit
and binaryen-master-64bit
from emsdk. v1.39.6 is working flawless.
It seems that the above mentioned issue only affects Chrome. On Firefox Nightly this problem does not occur. Here's a minimal reproduction: https://kleisauke.nl/issue-9775/.
The problem occurs in __wasm_init_memory
(func #0
), the data-segment index 4294967232
is close to UINT32_MAX
, so it is probably a problem in the bulk memory implementation of Chrome.
Does anyone know where I can report such things? I could submit a comment to https://bugs.chromium.org/p/v8/issues/detail?id=7747, but I don't know if this is appropriate.
Thanks for reporting! It looks like Chrome is erroneously decoding the data segment index as signed instead of unsigned. I'm working on a fix now.
@tlively it looks like a v8 issue that only occurs with more than 63 passive data segments. Is there some flag that users stop binaryen from generating so many segments? (unless that fix makes it way into chrome). (if not maybe we should add one?)
Wow that's unfortunate. There's logic in the pass to prevent the number of segments from surpassing the web limit of 100,000 segments, so I guess we can just change the limit in that logic.
As an (admittedly horrible) workaround, since the bug is in decoding a signed-LEB, you could avoid any segment index that has the bit pattern 01bb_bbbb
(and 1bbb_bbbb
01bb_bbbb
etc.) These will all be sign-extended. This means that 0..63, 128..8191, 16384..1048575 are OK.
I could do that, but I really would rather not :grimacing: @sbc100 @kripken do either of you think this is important enough for that hack?
I filed a v8 bug here: https://bugs.chromium.org/p/v8/issues/detail?id=10151
I didn't understand that last suggestion that the question was about - what does "avoid" refer to, who would be doing that? (something in binaryen, or at runtime, or elsewhere?) Is that PR not sufficient?
There are certain segment indexes that are interpreted incorrectly by v8, because they are decoded as signed numbers instead of unsigned numbers. 0 through 63 are OK, since the representation is the same for signed and unsigned LEB. But 64 through 128 are not, since 64 is decoded as -64 when interpreted as a signed LEB.
So in Binaryen, for this pass, you could potentially add empty data segments for the indexes that will be misinterpreted, and never use them with memory.init
or data.drop
instructions. This would allow you to raise the limit about 63 and still work with v8 (I believe).
I see, thanks! Yeah, that sounds like it would work too.
There's a huge increase in binary size when building https://github.com/niklasf/stockfish.wasm (v0.5.11) with the latest emscripten.
Any information I could provide that would help to narrow this down?
1.38.48 (last good):
1.39.0 (first bad):
1.39.1 (still bad).
Flags: https://github.com/niklasf/stockfish.wasm/blob/bb648dd021e3e585def8b86050800f47001524a5/src/Makefile#L243