WebAssembly / tool-conventions

Conventions supporting interoperatibility between tools working with WebAssembly.
Artistic License 2.0
302 stars 67 forks source link

Referencing symbols accross the DLL boundary #69

Open sbc100 opened 6 years ago

sbc100 commented 6 years ago

The current state of dynamic linking is described https://github.com/WebAssembly/tool-conventions/blob/master/DynamicLinking.md.

However this doesn't detail how imported symbols can referenced. Looking at the current implementation in emscripten it seems that all references to external symbols current go through JS functions:

Function Imports

Imported functions point JS thunks, which start empty and get updates as that various DLL are loaded. This works a little bit like PLT written in JS. Because all the complexity is moved JS all such functions are called with the regular call instruction and codegen doesn't need to know if a function is internal or external to the DLL.

Address Imports

For address that are external the current module, getters are used to retrieve the absolute symbol address. These are JS functions of the form g$<symbol_name> which return the absolute address of the given dynamic symbol. This means the codegen for -fPIC code needs to treat internal at external symbols very differently.

The circular nature of imports and exports between shared libraries means that we can't directly import functions from other modules. However, I think we can do better than forcing all references to external functions and address to go via JS.

I'm proposing the following scheme:

Function Imports

All functions imported from other DLLs go via call_indirect. The loader is responsible for allocating table slots for every function. This makes the table act as out PLT. Functions are imported as immutable globals that represent the table offset so each external call would looks like:

get_global $foo
call_indirect 

Address Imports

We can use mutable globals for this. The loader can allocate one mutable global for each data address. A load from external address would then look like:

get_global $bar
i32.load

Mutable globals are needed because the addresses all global won't be known until all DLLs have been loaded, and which point the loader and update the mutable globals.

Its possible that only code compiled with -fPIC would need this extra indirection. We could treat the main executable as unique and always load it last, at which point all data address and wasm function will be available to the loader and external functions can be imported directly, avoiding the call_indirect.

kripken commented 6 years ago

Good overview!

Some thoughts:

dschuff commented 6 years ago
lukewagner commented 6 years ago

The circular nature of imports and exports between shared libraries means that we can't directly import functions from other modules.

That makes sense in general, but is there any way to have this scheme be used as a fallback when normal acyclic linking isn't possible? I don't understand enough about how this normally works, but could this distinction be mapped to ELF symbol visibility, where the default visibility turns into an import and you have to explicitly opt into weak visibility that allows cycles and mutability via table elements?

kripken commented 6 years ago

@dschuff

support calling such functions from JS via some kind of opt-in mechanism (

Yeah, from JS we could make the existing ccall() helper work properly - it could do something like create a wasm module which wraps the i64 return etc. which would indeed hide the details as you say.

What slightly worries me is that this is a breaking change. However, it does seem like the right thing to aim for, and there are likely not many people using i64 return values currently.

Another option here is to wait for BigInt support in JS and integration with wasm, as it seems like that point the problem disappears...?

sbc100 commented 5 years ago

The circular nature of imports and exports between shared libraries means that we can't directly import functions from other modules.

That makes sense in general, but is there any way to have this scheme be used as a fallback when normal acyclic linking isn't possible? I don't understand enough about how this normally works, but could this distinction be mapped to ELF symbol visibility, where the default visibility turns into an import and you have to explicitly opt into weak visibility that allows cycles and mutability via table elements?

I think this would have to be codegen option. Perhaps we could allow developers to build with an option if they know their shared libraries don't have cyclic dependencies and they don't want to do symbol interposition. This would make it a runtime error if cycles were present.

Alternatively we could make it a linker option and try to have the linker relax from:

get_global $foo
call_indirect 

to

call $foo