emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.56k stars 3.28k forks source link

ReferenceError: _setThrew is not defined when linking as C++ #22227

Open curiousdannii opened 1 month ago

curiousdannii commented 1 month ago

Version of emscripten/emsdk: emscripten/emsdk:3.1.56 (a little behind I know, but I couldn't see any recent changes that looked relevant. I will try updating soon.)

I think this is a follow up to #21381. I have no compilation errors, but despite the invoke_ functions depending on _setThrew, it isn't defined.

function invoke_vi(index, a1) {
 var sp = stackSave();
 try {
  dynCall_vi(index, a1);
 } catch (e) {
  stackRestore(sp);
  if (e !== e + 0) throw e;
  _setThrew(1, 0);
 }
}

I suspect this may be happening because I've set set_target_properties(${target} PROPERTIES LINKER_LANGUAGE CXX) in CMake, but there are no actual C++ files being linked in? (We previously determined that my project needs to be linked as CXX because it's using Rust.)

sbc100 commented 1 month ago

The fact that this invoke_vi function exists in the generated JS is because the wasm module is importing a function by this name. These imports are generated during codegen when emscripten-style EH or emscripten-style setjmp longjmp is used.

The setThrew dependency is pulled in when the module uses either _emscripten_throw_longjmp or __cxa_end_catch:

https://github.com/emscripten-core/emscripten/blob/7b466b4f097b05b9f1e4f6ae0753f525cac08c92/src/library.js#L1000-L1006

https://github.com/emscripten-core/emscripten/blob/7b466b4f097b05b9f1e4f6ae0753f525cac08c92/src/library_exceptions.js#L168

I'm guessing your program does import either of those, but is still somehow depending on invoke_ imports.

Can you try adding adding settings.REQUIRED_EXPORTS.append('setThrew') after this line: https://github.com/emscripten-core/emscripten/blob/7b466b4f097b05b9f1e4f6ae0753f525cac08c92/tools/link.py#L1277

sbc100 commented 1 month ago

Can you find the object file that is referencing invoke_vi... For the longjmp case at least it should not be possible to create a reference to that function without also referencing _emscripten_throw_longjmp. I imagine the same is true for exception handling and __cxa_end_catch.

curiousdannii commented 1 month ago

It's happening during/after a Rust panic. I'm not sure if that means it's using a longjmp or exception. Ah, in debug mode I do get an assertion saying "Exception thrown, but exception catching is not enabled. Compile with -sNO_DISABLE_EXCEPTION_CATCHING or -sEXCEPTION_CATCHING_ALLOWED=[..] to catch."

It comes from a fairly simple Rust function

    pub fn glk_stream_get_current(&self) -> Option<GlkStream> {
        self.current_stream.as_ref().map(Into::<GlkStream>::into)
    }

Which is just turning a Weak into an Arc. So whatever is doing the longjmp/exception is likely deep in Rust's innards.

I added that line to link.py. Because I'm using Docker it seemed the easiest way was just to patch the file and then copy it over the top. However the file in the git tag of 3.1.63 includes the recent StackIR changes, which the emsdk 3.1.63 version doesn't know about ("Unknown option '--no-stack-ir'"). I had to grap the link.py from 150af6e566 instead. Seems like a tagging mismatch?

Can confirm that adding that line fixes _setThrew not being output.

I'll continue digging...

sbc100 commented 1 month ago

IIUC the _setThrew call is likely coming from the LLVM passed for SjLj or exception handling. I didn't know it was possible to generate the _setThrew call without also generating a call to either _emscripten_throw_longjmp or __cxa_end_catch... but rust must be doing something that C++ doesn't here...

curiousdannii commented 1 month ago

I think this is Rust's panic implementation? https://github.com/rust-lang/rust/blob/1.77.0/library/panic_unwind/src/emcc.rs It does call __cxa_end_catch.

However I'm not sure if I'm actually using the unwind panic or the abort panic... Edit: pretty sure I am using that unwind panic.

sbc100 commented 1 month ago

So __cxa_end_catch (at least the emscripten JS version) has a direct dependency on setThrew. You can see that above. That should cause setThrew to be exported whenever __cxa_end_catch is imported.

Can you try linking with -Wl,--trace-symbol=__cxa_end_catch and maybe -Wl,--trace-symbol=setThrew to see what is going on here?

curiousdannii commented 1 month ago

Just noticed _setThrew is only not defined in release builds.

Debug build:

../remglk/target/wasm32-unknown-emscripten/debug/libremglk_capi.a(panic_unwind-034648b5c8ac72cd.panic_unwind.13c52233c65720cd-cgu.0.rcgu.o): reference to __cxa_end_catch
/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libcompiler_rt.a(emscripten_exception_builtins.o): lazy definition of setThrew
/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/libcompiler_rt.a(emscripten_exception_builtins.o): definition of setThrew
/tmp/tmpt0_dn45mlibemscripten_js_symbols.so: importing __cxa_end_catch
/tmp/tmpt0_dn45mlibemscripten_js_symbols.so: exported setThrew due to import of __cxa_end_catch

Release build:

/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/thinlto/libcompiler_rt.a(emscripten_exception_builtins.o): lazy definition of setThrew
/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/thinlto/libcompiler_rt.a(emscripten_exception_builtins.o): definition of setThrew
sbc100 commented 1 month ago

So in the release build __cxa_end_catch is no being imported so we assume we don't need to export setThrew. But someone the invoke_xx function is still being imported anyway. This doesn't seems to happen for C++ objects.

Can you trace the invoke_vi symbol? (Now sure that will work or not).

curiousdannii commented 1 month ago

Release build:

../remglk/target/wasm32-unknown-emscripten/release/libremglk_capi.a(remglk_capi-27bf10718c25cc09.remglk_capi.a6cdab8c84f2a7db-cgu.00.rcgu.o): reference to invoke_vi
/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/thinlto/libcompiler_rt.a(emscripten_exception_builtins.o): lazy definition of setThrew
/emsdk/upstream/emscripten/cache/sysroot/lib/wasm32-emscripten/thinlto/libcompiler_rt.a(emscripten_exception_builtins.o): definition of setThrew

Maybe it thinks the call to __cxa_end_catch is dead code?

sbc100 commented 1 month ago

Do you know which source file resulted in libremglk_capi.a(remglk_capi-27bf10718c25cc09.remglk_capi.a6cdab8c84f2a7db-cgu.00.rcgu.o)? What settings were used (-s flags) when it was compiled?

curiousdannii commented 1 month ago

Rust makes object files per crate, not per source files, so that's not an easy question. But I did put breakpoints on the invoke_ functions, and they get called from main, the OnceLock implementation, Path::join, and then some of my own functions. They're called all over the place. Possibly Mutex::lock? I use that all over.

I'm not compiling it with any manual -s flags, just which ever are the default ones in Rust. Possibly only -sABORTING_MALLOC=0?

curiousdannii commented 1 month ago

I tried again with a debug build, and the invoke_ functions are being called by things like these:

$_$LT$alloc..vec..Vec$LT$T$GT$$u20$as$u20$alloc..vec..spec_from_iter_nested..SpecFromIterNested$LT$T$C$I$GT$$GT$::from_iter::h5c78325b77b8248a (glulxe.wasm-05cc0df2:0x4b88da)
$alloc::raw_vec::RawVec$LT$T$C$A$GT$::allocate_in::hd5f3769a4a416929 (glulxe.wasm-05cc0df2:0xac706d)
$alloc::vec::Vec$LT$T$C$A$GT$::extend_trusted::ha7ec1d31adcf38d7 (glulxe.wasm-05cc0df2:0x4c8f9c)
$core::iter::traits::iterator::Iterator::fold::h17414469fbd8028e (glulxe.wasm-05cc0df2:0xd87e7)
$core::iter::adapters::map::map_fold::_$u7b$$u7b$closure$u7d$$u7d$::h1d296caadffa34ab (glulxe.wasm-05cc0df2:0x4a9cfa)
$_$LT$core..iter..adapters..take_while..TakeWhile$LT$I$C$P$GT$$u20$as$u20$core..iter..traits..iterator..Iterator$GT$::fold::h7d7d1ae963abf85d (glulxe.wasm-05cc0df2:0x4925d1)

These functions are often very simple. Here's the Iterator::fold one for example:

https://doc.rust-lang.org/src/core/iter/traits/iterator.rs.html#2581-2591

And here's the map::map_fold:

https://github.com/rust-lang/rust/blob/00167abb4148d11476b0ebc127b769e0366be660/library/core/src/iter/adapters/map.rs#L85

Are the invoke_ functions just being called for indirect function calls?

sbc100 commented 1 month ago

So the only place the invoke_ wrappers are ever generated is during the WebAssemblyLowerEmscriptenEHSjLj pass.

See https://github.com/llvm/llvm-project/blob/5e22a536981483d8411266dccee2ff3e5f31f4a1/llvm/lib/Target/WebAssembly/WebAssemblyLowerEmscriptenEHSjLj.cpp#L509

Somehow with C++ EH and with setjmp/longjmp these always get generated alongside calls to either __cxa_end_catch or _emscripten_throw_longjmp.

I'm guess that the rust compiler is generating some kind of EH usage or setjmp/longjmp usage that is different to the C++ compiler. @aheejin any ideas here?

sbc100 commented 1 month ago

My guess is that rust is generating invokes but without the corresponding catches that call __cxa_end_catch?

While we should fix this... I also wonder if we can find a way to eliminate those invokes... since calling out the JS for the invoke_xxx functions is fairly costly.

As a temporary workaround I assume you can build with -sEXPORTED_FUNCIONS=_setThrew ?

curiousdannii commented 1 month ago

Yes, setting -sEXPORTED_FUNCTIONS=[_main,_setThrew] outputs _setThrew.

(I wish there was a -sEXPORTED_FUNCTIONS+=[_setThrew] option, to append to rather than replace these lists.)

After the invoke_ calls I see calls to $__cxa_find_matching_catch_2, but that's all. There's no call to __cxa_end_catch in my wasm.

sbc100 commented 1 month ago

You can just do -sEXPORTED_FUNCTIONS=_main,_setThrew

We are trying to find a way to make additive settings work: https://github.com/emscripten-core/emscripten/pull/19938

aheejin commented 1 month ago

I am not familiar with how Rust exceptions are compiled; more precisely, how Rust is using our JS libraries written for C++. Can you provide an .ll file and a command to llc? Clang has -emit-llvm option that emits a bitcode file. I guess rustc would have a similar option?

curiousdannii commented 1 month ago

remglk deps.zip I'm not too sure what you're looking for, so I got it to emit LLVM IR, and just attached everything. Luckily there's not too many dependencies. remglk and remglk_capi are the two crates that are my actual project.

I'm not sure what you meant by "a command to llc".

aheejin commented 1 month ago

Sorry for the late reply. Yeah your bitcode files contain a bunch of invokes, but these shouldn't be converted to invoke_vi -like calls unless you run WebAssemblyLowerEmscriptenEHSjLj. For clang, this runs when you run clang with one of -mllvm -enable-emscripten-cxx-exceptions, -mllvm -enable-emscripten-sjlj, and -mllvm -wasm-enable-sjlj. But I'm not sure which set of Rust flags you are using. Do your Rust flags contain related to EH? If it is the default, does it use Emscripten EH by default?

Edit: I pasted incorrect set of flags for the condition, so I fixed them

curiousdannii commented 1 month ago

@aheejin I'm not manually setting any flags, so yeah just the default flags. I'm not sure what they are, but just found these:

https://github.com/rust-lang/rust/blob/aa877bc71c8c8082122bee23d17c8669f30f275d/compiler/rustc_codegen_llvm/src/llvm_util.rs#L111

https://github.com/rust-lang/rust/blob/7120fdac7a6e55a5e4b606256042890b36067052/compiler/rustc_codegen_llvm/src/intrinsic.rs#L921

https://github.com/rust-lang/rust/blob/7120fdac7a6e55a5e4b606256042890b36067052/compiler/rustc_codegen_ssa/src/base.rs#L356

aheejin commented 1 month ago

This happens because Rust programs that use LLVM Itanium EH IR may not use __cxa_end_catch, but still need setThrew because of invokes.

A similar thing happened for Emscritpen SjLj so we added setThrew to emscripten_throw_longjmp's deps even though emscripten_throw_longjmp did not directly use setThrew: https://github.com/emscripten-core/emscripten/blob/8c81cac1bbae378bc7dde0c21c99602cbaf452d0/src/library.js#L1010-L1017 But it's tricky to do a similar thing here because we don't have a JS function that is always included in case of EH. (getTempRet0 seems to be always included but it is an assembly function) __resumeException can be one target but it is only included when LLVM IR's resume instruction is used, which is not always the case for all EH programs.

@sbc100 Do you think we can add setThrew to EXPORTED_FUNCTIONS (or DEFAULT_LIBRARY_FUNCS_TO_INCLUDE, I don't really know what the difference between the two) whenever DISABLE_EXCEPTION_CATCHING is false, in emcc.py? This makes it to be included even if exceptions are not used in the program, but given that the code is like three lines, would it be a problem?

Also, I think I used to see many additions to EXPORTED_FUNCTIONS or DEFAULT_LIBRARY_FUNCS_TO_INCLUDE in emcc.py but I can't see any now. Did we change how we deal with them?

sbc100 commented 4 weeks ago

@sbc100 Do you think we can add setThrew to EXPORTED_FUNCTIONS (or DEFAULT_LIBRARY_FUNCS_TO_INCLUDE, I don't really know what the difference between the two) whenever DISABLE_EXCEPTION_CATCHING is false, in emcc.py? This makes it to be included even if exceptions are not used in the program, but given that the code is like three lines, would it be a problem?

That seems reasonable as long as DISABLE_EXCEPTION_CATCHING defaults to true, so it won't effect all programs.

Also, I think I used to see many additions to EXPORTED_FUNCTIONS or DEFAULT_LIBRARY_FUNCS_TO_INCLUDE in emcc.py but I can't see any now. Did we change how we deal with them?

I think you will find these now in tools/link.py.

aheejin commented 3 weeks ago

Thanks! Where am I supposed to add setThrew, EXPORTED_FUNCTIONS or DEFAULT_LIBRARY_FUNCS_TO_INCLUDE? I have been always confused about the uses of those two...

curiousdannii commented 3 weeks ago

I haven't manually set DISABLE_EXCEPTION_CATCHING, so I think it's still on in my project. So wouldn't adding setThrew only if it's been set to false not solve the linking problem for projects linking in Rust libraries?

aheejin commented 3 weeks ago

In emcc + C++, if you set DISABLE_EXCEPTION_CATCHING, that LowerEmscriptenEHSjLj pass would not run and references to invokes or setThrew are not gonna be generated. But I guess you use Rust toolchain, which has a different workflow.

I'm not familiar with Rust toolchain's workflow. How does it work with emcc? Do you use emcc or not? In emcc, DISABLE_EXCEPTION_CATCHING defaults to true and if it's true it calls clang without -enable-emscripten-cxx-exceptions, which causes LowerEmscriptenEHSjLj not to run. But I'm not sure how Rust toolchain works.

curiousdannii commented 3 weeks ago

I'm compiling my Rust code into a static library for the wasm32-unknown-emscripten target. Emcc is not involved yet. Then I link it into my app using emcmake (which calls emcc).

So it seems like this is the cause? https://github.com/rust-lang/rust/blob/aa877bc71c8c8082122bee23d17c8669f30f275d/compiler/rustc_codegen_llvm/src/llvm_util.rs#L111

I'm guessing that I have to link with -enable-emscripten-cxx-exceptions? Or do I also need to get my C code to also be compiled with it? Maybe NO_DISABLE_EXCEPTION_CATCHING is required if you're linking Rust (with unwind)?

@tlively You originally added that flag in https://github.com/rust-lang/rust/commit/62c3443e9659be2bf0ae53df1421c846ceaca904 Do you remember anything about it? Any suggestions on what the best way to get Rust and Emcc to play nicely together here is?

aheejin commented 3 weeks ago

If you don't use emcc, which tool did you feed -sEXPORTED_FUNCIONS=_setThrew to?

curiousdannii commented 3 weeks ago

Yes I set that in cmake/emcc. It's just the Rust staticlib that's compiled without emcc.