emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.83k stars 3.31k forks source link

[library_dylink.js] How to link a Rust side module with C++ main module: missing `invoke_` functions in proxyHandler #22906

Open carlopi opened 5 days ago

carlopi commented 5 days ago

Please include the following in your bug report:

Version of emscripten/emsdk: emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.68 (ceee49d2ecdab36a3feb85a684f8e5a453dde910) clang version 20.0.0git (https:/github.com/llvm/llvm-project 5cc64bf60bc04b9315de3c679eb753de4d554a8a) Target: wasm32-unknown-emscripten Thread model: posix InstalledDir: /Users/carlo/emsdk/upstream/bin

The situation: I work on duckdb-wasm, a C++ based codebase then compiled to Wasm via Emscripten. To allow to extend the surface of the project, we allow, both in native and Wasm, to add code at run-time via extensions.

In the case of Wasm this means that main project comes with a JS and Wasm module, extension itself is a Wasm module, and via Emscripten's implementation of dlopen import / exports are remapped dynamically and then execution continues with additional functionality.

This works end to end for C++ extensions, where on performing dlopen imports are correctly remapped in the JS layer and stuff works as expected.

Basic is like:

SELECT st_area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry);
Catalog Error: Scalar Function with name "st_area" is not in the catalog, but it exists in the spatial extension.

LOAD spatial;
SELECT st_area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::geometry);
┌─────────────────────────────────────────────────────────────────┐
│ st_area(CAST('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))' AS geometry)) │
╞═════════════════════════════════════════════════════════════════╡
│                                                             1.0 │
└─────────────────────────────────────────────────────────────────┘

(live demo at https://shell.duckdb.org/#queries=v0,SELECT-st_area('POLYGON((0-0%2C-0-1%2C-1-1%2C-1-0%2C-0-0))'%3A%3Ageometry)~,LOAD-spatial~,SELECT-st_area('POLYGON((0-0%2C-0-1%2C-1-1%2C-1-0%2C-0-0))'%3A%3Ageometry)~)

Now I am looking to do the same, but using Rust-based code, compiled with cargo with target wasm32-unknown-emscripten.

The problem Rust code compiles, but on dlopen there are some missing symbols errors that are raised by code that looks like to come from https://github.com/emscripten-core/emscripten/blob/main/src/library_dylink.js#L699 that is like:

                    var proxyHandler = {
                        get(stubs, prop) {
                            switch (prop) {
                                case "__memory_base":
                                    return memoryBase;
                                case "__table_base":
                                    return tableBase
                            }
                            if (prop in wasmImports && !wasmImports[prop].stub) {
                                return wasmImports[prop]
                            }
                            if (!(prop in stubs)) {
                                var resolved;
                                stubs[prop] = (...args) => {
                                    resolved ||= resolveSymbol(prop);
                                    return resolved(...args)
                                }
                            }
                            return stubs[prop]
                        }
                    };

Error is that invoke_viii, or invoke_vii or similarly named functions are not present at the JavasScript level.

The hack Adding a conditional like:

                    var proxyHandler = {
                        get(stubs, prop) {
+                          if (prop.startsWith("invoke_")) {
+                              return createDyncallWrapper(prop.substring(7));
+                          }
                            switch (prop) {
                                case "__memory_base":

solves the problem, and shows that basically the class of invoke_-functions are special in the fact that their code can be reconstructed starting from the signature to perform the correct indirect call.

This workaround is very brittle, I am looking for a more proper solution / directions / material on how this can be properly supported.

kripken commented 5 days ago

@hoodmane @sbc100 would know the proper answer, but a workaround is to use Wasm Exceptions, -fwasm-exceptions. That would remove the need for the invoke functions entirely.

hoodmane commented 5 days ago

Rust doesn't support -fwasm-exceptions at all unfortunately. Cf https://github.com/rust-lang/rust/pull/131830

hoodmane commented 5 days ago

@carlopi I've encountered this problem before and your workaround looks pretty similar to what I did. It would also fix it to compile with -Cunwind=abort if you don't need to catch panics.

sbc100 commented 5 days ago

Funnily enough I just submitted a change to remove createDyncallWrapper completely: #22825.

All the invoke_xx functions needed by a given module should be contained within the module itself (this is true for both the main module and the side module).

There was an issue where these functions were not being correctly added to the global list when libraries were loaded with RTLD_LOCAL. However, that was fixed in #22625, which was released as part of 3.1.68. So upgrading to 3.1.68 or above I would hope would fix this issue.

sbc100 commented 5 days ago

Or wait, I see you are on 3.1.68 already.

Can you confirm, the which invoke_xx symbols are missing and which dynCall_xx exports are present in the side module you are trying to load?

sbc100 commented 5 days ago

Imports that start with invoke_ should be implement via createInvokeFunction which is called from resolveGlobalSymbol: https://github.com/emscripten-core/emscripten/blob/2e84cfdc8b7a5e04fc3d2f8a0e09ac98e2a88a0f/src/library_dylink.js#L130-L133. Can you see why this code is not executing in your case?

carlopi commented 4 days ago

@carlopi I've encountered this problem before and your workaround looks pretty similar to what I did. It would also fix it to compile with -Cunwind=abort if you don't need to catch panics.

I do need to be able to catch panics, but I am not sure this is connected to exceptional behaviour, this looks to me it's about regular handling of indirect calls.

@sbc100: Taking for example the module at https://community-extensions.duckdb.org/v1.1.3/wasm_eh/lindel.duckdb_extension.wasm, it has 11 invoke_ and 3 dynCall_:

% grep "import.*invoke_" lindel.wat                                                       
  (import "env" "invoke_vi" (func (;56;) (type 3)))
  (import "env" "invoke_vii" (func (;59;) (type 5)))
  (import "env" "invoke_viii" (func (;61;) (type 8)))
  (import "env" "invoke_iiii" (func (;62;) (type 19)))
  (import "env" "invoke_viiiii" (func (;63;) (type 13)))
  (import "env" "invoke_ii" (func (;64;) (type 1)))
  (import "env" "invoke_v" (func (;73;) (type 2)))
  (import "env" "invoke_iiiiii" (func (;83;) (type 16)))
  (import "env" "invoke_vij" (func (;122;) (type 8)))
  (import "env" "invoke_vijj" (func (;123;) (type 13)))
  (import "env" "invoke_ji" (func (;124;) (type 1)))
% grep "export.*dynCa" lindel.wat
  (export "dynCall_vij" (func 475))
  (export "dynCall_vijj" (func 476))
  (export "dynCall_ji" (func 477))

And I get into proxyHandler on an usage of invoke_ji that can't find the symbol.

Also strange that I see this problem after dlopen completed, but while executing regular code.

Thanks a lot for the help obviously.

I do have also another concern: are Emscripten versions expected to be compatible cross versions?

My problem is that main module and side modules are built at different point in times, machines / setups, and it's not obvious they are on the same Emscripten version. Should I enforce that, or there is any way to get, given 2 Emscripten versions, whether they are ABI compatible? Or there is no such a guarantee? Is this in the roadmap?

sbc100 commented 4 days ago

My problem is that main module and side modules are built at different point in times, machines / setups, and it's not obvious they are on the same Emscripten version. Should I enforce that, or there is any way to get, given 2 Emscripten versions, whether they are ABI compatible? Or there is no such a guarantee? Is this in the roadmap?

Sadly we don't currently offer guarantees about ABI compatibility so its best if you can build the main module and side modules with the same version of emscripten. By the way does the problem here go away if you do that? i.e. is this bug actually about ABI incompatibility between modules built with different versions of emscripten? Or is there something else here too?

In terms of ABI compatibility we do hope that breakages are rare, and we do help to one day make stronger guarantees about this.

carlopi commented 4 days ago

@sbc100: Problem I encountered in local development, so same version of emscripten both to build main wasm module & accompanying JavaScript and side rust module.

sbc100 commented 4 days ago

And I get into proxyHandler on an usage of invoke_ji that can't find the symbol.

Can you verify if resolveGlobalSymbol is being called for invoke_ji and if that in turn is calling createInvokeFunction? Is createInvokeFunction even part of that output?