emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.63k stars 3.28k forks source link

"local count too large" with large generated C++ function in debug mode #19346

Open skelly-energid opened 1 year ago

skelly-energid commented 1 year ago

Please include the following in your bug report:

Version of emscripten/emsdk: emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.34 (57b21b8fdcbe3ebb523178b79465254668eab408) clang version 17.0.0 (https://github.com/llvm/llvm-project a031f72187ce495b9faa4ccf99b1e901a3872f4b) Target: wasm32-unknown-emscripten Thread model: posix

Using this python script to generate the code:

#!/usr/bin/env python3

import os
import sys

code = ""

for i in range(0, 1200):
    code += f"""case {i}:
            otherfunc(arg2, "foo", local2, local1, local3, arg1, arg3, arg4, arg5, arg6, arg7, arg8);
            break;
    """

with open("generated.cpp", "w") as f:
    f.write(
f"""

void otherfunc (...) {{
}}

void somefunc(int arg1, int arg2, char arg3, int arg4, int arg5, int arg6, int arg7, int arg8) {{
  char local1 = 0;
  char local2 = 0;
  char local3 = 0;

  switch (arg4) {{
    {code}
    default:
      break;
  }}
}}

int main(int, char**)
{{
    somefunc(0, 0, 's', 0, 0, 0, 0, 0);
    return 0;
}}

""")

and this command line:

../generate.py && em++   -g -o main.html generated.cpp

Chrome fails with something like

main.js:870 Uncaught (in promise) RuntimeError: Aborted(CompileError: WebAssembly.instantiate(): Compiling function #2 failed: local count too large @+452)

Firefox fails with something like

Uncaught (in promise) RuntimeError: Aborted(CompileError: wasm validation error: at offset 455: too many locals)

I don't really know what is meant by locals here, because there are not many local variables.

sbc100 commented 1 year ago

Does the problem go away when you run with -O1 or -O2?

Can you try also with -sERROR_ON_WASM_CHANGES_AFTER_LINK -sWASM_BIGINT? There are two possible causes here, firstly llvm could be generating a function with that many locals, secondly one of the binaryen optimizations could be causing it. If it still fails with -sERROR_ON_WASM_CHANGES_AFTER_LINK -sWASM_BIGINT then we would know it was llvm that was at fault. I'm not sure what we can do about that though.. it might be simplest to suggest that you break up your huge function.

skelly-energid commented 1 year ago

Thanks, yes, it is only reproducible in debug mode. If I add -O1 I can't reproduce it anymore. I can still reproduce it without -O1 and with -sERROR_ON_WASM_CHANGES_AFTER_LINK -sWASM_BIGINT.

I've already broken up the generated function, and that has solved the problem for me. As you say though, this could still be a llvm bug.

mike-lischke commented 1 year ago

I got the same error when I enabled Asan via build parameter -fsanitize=address. What works is -fsanitize=undefined, though.

sbc100 commented 1 year ago

@mike-lischke, does adding -sERROR_ON_WASM_CHANGES_AFTER_LINK and -sWASM_BIGINT fix the issue for you?

mike-lischke commented 1 year ago

I'm on travel with limited internet connection. Will check when I return in 2 weeks.

mike-lischke commented 1 year ago

@sbc100 I have now tested your suggestion and get this error:

em++: error: changes to the wasm are required after link, but disallowed by ERROR_ON_WASM_CHANGES_AFTER_LINK: ['/opt/homebrew/Cellar/emscripten/3.1.43/libexec/binaryen/bin/wasm-emscripten-finalize', '--output-source-map-url=antlr4-runtime-wasm.wasm.map', '-g', '--bigint', '--no-dyncalls', '--no-legalize-javascript-ffi', '--dwarf', 'wasm/antlr4-runtime-wasm.wasm', '-o', 'wasm/antlr4-runtime-wasm.wasm']

Unclear why a change to the wasm file would be necessary. The full command line is:

em++ \ wasm/antlr4-runtime-wrapper.cpp \ antlr4-cpp-runtime/*.cpp\ antlr4-cpp-runtime/atn/*.cpp\ antlr4-cpp-runtime/dfa/*.cpp\ antlr4-cpp-runtime/internal/*.cpp\ antlr4-cpp-runtime/misc/*.cpp\ antlr4-cpp-runtime/support/*.cpp\ antlr4-cpp-runtime/tree/*.cpp\ antlr4-cpp-runtime/tree/pattern/*.cpp\ antlr4-cpp-runtime/tree/xpath/*.cpp\ -O0\ -std=c++17 -lembind -Iantlr4-cpp-runtime/ \ -fwasm-exceptions\ -o wasm/antlr4-runtime-wasm.js\ -s EXPORT_ES6=1\ -s WASM=1\ -s WASM_BIGINT=1\ -s ALLOW_MEMORY_GROWTH=1\ -s DEMANGLE_SUPPORT=1\ -g\ -gsource-map\ -s ERROR_ON_WASM_CHANGES_AFTER_LINK=1\ -fsanitize=address

sbc100 commented 1 year ago

Can you try without -gsource-map?

mike-lischke commented 1 year ago

Also without source maps the same error comes up (failed to asynchronously prepare wasm: CompileError: WebAssembly.instantiate(): Compiling function #77:"embind_init_main()" failed: local count too large @+51266).

sbc100 commented 1 year ago

And this was with -sERROR_ON_WASM_CHANGES_AFTER_LINK? i.e. did you manage to link with that flag?

What other flags are you using? I know -fsanitize=address can cause a lot of extra locals to get added. We have to disable several tests under asan because of the local count too large" issue. Do you see the this error even without -fsanitize=address?

mike-lischke commented 1 year ago

Yes, I could link without -gsource-map. The "local count too large" comes up when I run the test script. These are the flags I use now:

    -O0\
    -std=c++17 -lembind -Iantlr4-cpp-runtime/ \
    -o wasm/antlr4-runtime-wasm.js\
    -s EXPORT_ES6=1\
    -s WASM=1\
    -s WASM_BIGINT=1\
    -s ALLOW_MEMORY_GROWTH=1\
    -s DEMANGLE_SUPPORT=1\
    -s ERROR_ON_WASM_CHANGES_AFTER_LINK=1\
    -fwasm-exceptions\
    -fsanitize=address\
    -g

Using -fsanitize=undefined instead makes the app running, but then I cannot debug that memory alignment issue I have.

sbc100 commented 1 year ago

I see, so this seems like an issue with -fsanitize=address resulting in too many locals being created. This will most likely require an upstream llvm change, and I'm not quite sure that it would look like.

BTW, do you have any information about the function in question (the one with too many locals): embind_init_main? Is it a particularly long or complex function?

One possible workaround here would be to split this function up at the source level into several smaller functions. In the case of EMSCRIPTEN_BINDINGS (which it looks like this is) you could create several blocks such as :

EMSCRIPTEN_BINDINGS(main1, {
...
}

EMSCRIPTEN_BINDINGS(main2) {
...
}
etc.
mike-lischke commented 1 year ago

Somehow I anticipated this question :-) But no, this is not from my code. I searched my entire project and could not find it, not even in generated code. How much are "to many locals"? However, since I want to make all public classes available in JS/TS, I have bound many classes already. If the EMSCRIPTEN_BINDINGS is the culprit I can certainly structure that differently.

What are those names in that macro call actually? The documentation doesn't really explain them. Do they matter in some way?

Update: That was it! I split the main bindings (around 2000 lines, 53 classes) into 6 separated binding functions and now I can execute the code and get an error report:

SUMMARY: AddressSanitizer: heap-use-after-free (file:///Volumes/Extern/Work/projects/antlr4wasm/wasm/antlr4-runtime-wasm.wasm+0x84f1f1) in antlr4::atn::LexerATNSimulator::match(antlr4::CharStream*, unsigned long)+0x84f1f1

That's great! Many thanks for your help Sam!

sbc100 commented 1 year ago

The name passed as the first argument to EMSCRIPTEN_BINDINGS doesn't really mean anything.. it just gives the generated function a unique name. So you can use whatever name you want as long as don't repeat names (since that will result in multiple symbols with the same name.

mmomtchev commented 11 months ago

@sbc100 I have the same problem when compiling a huge 300k lines SWIG-generated C++ file. In my case the problem is very real - it happens in a function which registers all the exported classes which has 768 blocks of this type:

do {
 // register class and methods
} while(0);

Every block contains 4 local variables. When I use -O2 the compiler must optimize some (a large part?) of those variables and everything works. Otherwise I get the local count too large. Isn't it possible to configure this limit somewhere?

sbc100 commented 11 months ago

I believe the local count too large error is coming from the wasm runtime (e.g. v8) and I don't think it configurable, no.

We could perhaps have binaryen (wasm-opt) and/or llvm try to limit the number of locals they allow themselves to use. It might be easier on the binaryen side but I'm not sure. @tlively @kripken WDTY?

In the short term I'm afraid that your best best is probably to attempt to break up your large function into smaller ones.

kripken commented 11 months ago

Binaryen does do this when it optimizes. I think that's why as reported above this works with -O2. It's just a problem in unoptimized builds.

As a workaround, you can do some manual optimization of the wasm like this: wasm-opt --simplify-locals --coalesce-locals a.wasm -o a.wasm, which in most cases should fix this (an entire run of -O would be even better, but much slower).

mmomtchev commented 11 months ago

@kripken Alas, wasm-opt produces a binary that segfaults on instantiateStreaming. I had to also add --enable-bulk-memory as directed.

Finally, building just that one source file with -O1 while the rest is with -O0 produces a working binary, so this is acceptable to me for debugging.

yowl commented 7 months ago

I notice that llvm geps where the offset is a constant produce a local in the wat which is unneeded and optimised away with O1 . It would seem possible to eliminate that.

kohakukun commented 2 months ago

I'm getting the same issue, the difference is that, if I enable your suggested flag @sbc100 ERROR_ON_WASM_CHANGES_AFTER_LINK I get the error: em++: error: changes to the wasm are required after link, but disallowed by ERROR_ON_WASM_CHANGES_AFTER_LINK: ['emsdk/upstream/bin/wasm-emscripten-finalize', '-g', '--bigint', '--no-legalize-javascript-ffi', '--check-stack-overflow', '../../bin/release/out.wasm', '-o', '../../bin/release/out.wasm']

Also, I notice I get this error when addind the ASYNCIFY option

sbc100 commented 2 months ago

@kohakukun can you remove share the original link command that caused the failure? Along with the failure message itself? Does it happen if you add -O1?

MikhailGorobets commented 2 months ago

I also have the problem with the ASYNCIFY flag

xbcnn commented 3 weeks ago

I met same issue with ASYNCIFY too.