Using wasm globals to represent data/function addresses in object files

sbc100 commented 3 years ago

;tldr; should we use globals hold data addresses, even in non-PIC object files.

The current tooling conventions for object files (and the default used by llvm) is to represent data and functions address only in relocations.

Taking the address or a data symbol or function symbol results in the following code:

i32.const <reloc>

Where the reloc exists only in the linking section and is either a R_WASM_MEMORY_ADDRESS_LEB (for data symbols) or R_WASM_FUNCTION_INDEX_LEB (for function symbols).

With the experimental PIC ABI used by emscripten these address are instead model as wasm globals and produces the following pattern.

global.get <reloc>

In this case the relocation type is R_WASM_GLOBAL_INDEX_LEB.

When linking object built with -fPIC into static binaries that linker creates internal immutable globals that represent the static address of the symbol. Because the global is internal and immutable wasm-opt can then completely eliminate the global and replace the global.get with an i32.const. This can be though of as form of linker relaxation that happen in the post-link optimizer. With a little work we could teach wasm-ld to perform this relaxation directly.

My suggestion is to use globals in similar fashion by default and even when -fPIC is not specified. Here are some of the advantages, as I see them:

~~It simplifies llvm, having just one way to get symbol addresses.~~
It means more comaptability between object files. (-fPIC object need no longer be special)
We can use the name section to name wasm globals (https://github.com/WebAssembly/wabt/pull/1554) which means that inspecting object files will give much clearer results. The llvm symbol names can appear the output of normal wasm disassemblers such as wasm2wat and wasmdis, whereas this information was previously hidden in the relocation data.
Undefined data symbols can now be represented as imports by the linker. Previously there is now way for the linker to turn an undefined data symbol into an import which is why --allow-undefined doesn't do what most people think it does WRT to data symbols.

Downsides:

Unoptimized builds with a lot of global data symbols or address taken functions will have extra wasm globals. (wasm-opt can remove all of these).

sbc100 commented 3 years ago

Could be either an upside or a downside: This might motivate us to actually make linker relaxation work which could have other benefits.

sbc100 commented 3 years ago

@sunfishcode @dschuff @tlively

tlively commented 3 years ago

I particularly like that this would give data symbols names in the name section, but I am concerned about depending more on wasm-opt for code size and performance. Is it correct that implementing linker relaxation would essentially negate any simplicity benefits we would get from this? Also, what additional benefits would linker relaxation bring?

sbc100 commented 3 years ago

Performance-wise I imagine that global.get <immutable-const-non-imported-global> would be compiled down to the same thing as i32.const anyway right? I that is correct assumption then really just a code size question.

Is it correct that implementing linker relaxation would essentially negate any simplicity benefits we would get from this?

Yes, I guess point (1) doesn't really hold up if we end up doing linker relaxation in response to this change. The other 3 arguments stronger than that one anyway I think.

Regarding linker relaxation, one other example might be code that was compiled with TLS but then linked without --shared-memory.

It would imagine it could transform the following pattern/relocation:

global.get __tls_base
i32.const <tls_relocation>
i32.add

Into just:

i32.const <static address>

dschuff commented 3 years ago

Performance-wise I imagine that global.get <immutable-const-non-imported-global> would be compiled down to the same thing as i32.const anyway right? I that is correct assumption then really just a code size question.

Is it correct that implementing linker relaxation would essentially negate any simplicity benefits we would get from this?

Yes, I guess point (1) doesn't really hold up if we end up doing linker relaxation in response to this change. The other 3 arguments stronger than that one anyway I think.

Well it would simplify the compiler, and move the complexity to the linker. It could still be a benefit if the e.g. compiler needed to have different codepaths in more than one place, whereas the linker can just have a different codepath in only one place? Not sure if that's actually the case though.

Regarding linker relaxation, one other example might be code that was compiled with TLS but then linked without --shared-memory.

It would imagine it could transform the following pattern/relocation:
global.get __tls_base
i32.const <tls_relocation>
i32.add
Into just:
i32.const <static address>

Traditional linker relaxation basically just rewrites code in place (leaving nop padding) rather than actually shrinking anything, presumably for speed and simplicity. Are we imagining that as a possibility too? Or are we already rewriting everything in the linker to get smaller LEBs? I'm guessing not, actually. Because doing that would invalidate debug info, right?

sbc100 commented 3 years ago

We do have an (off by default) option called --compress-relocations which can reclaim space, but we don't use it, or eve recommend using it. So yes you are right we won't save space by default adding linker relaxation.

sbc100 commented 3 years ago

So it sounds like there is no upside to (1), and that we don't need we will gain much from linker relaxation. But I think the other arguments for doing this still stand.

sunfishcode commented 3 years ago

At first glance, it looks like this should be compatible with module linking; does that sound right?

Objdump has a -r option which can show the relocations interspersed with the disassembly, which is useful for other kinds of relocations as well. It seems like it wouldn't be bad if the other disassemblers people use could do this too.

Like @dschuff I also wonder how much linker relaxation affects linking speed. A variant of this proposal would be to emit imports for globals, but continue to codegen addresses as i32.const with a relocation instead of a global.get (in non-PIC modes). That would make it more natural for --allow-undefined to be similar between functions and data, but wouldn't require linker relaxation to optimize away the global.gets. If linker relaxation has a significant impact on linking speed, this hybrid approach may help.

sbc100 commented 3 years ago

Like @dschuff I also wonder how much linker relaxation affects linking speed. A variant of this proposal would be to emit imports for globals, but continue to codegen addresses as i32.const with a relocation instead of a global.get (in non-PIC modes).

I'm having trouble understanding what you are suggesting. Why emit imports for globals at all if we are going to generate i32.const for their addresses? Where/How would these imports be used?

sunfishcode commented 3 years ago

I'm having trouble understanding what you are suggesting. Why emit imports for globals at all if we are going to generate i32.const for their addresses? Where/How would these imports be used?

It'd just arrange for all undefined symbols have imports, which I imagine would make it easier to think about things like --allow-undefined, even though that's not the only way it could work. I'm not attached to the idea.

sbc100 commented 3 years ago

Sorry, are you suggesting that, given and undefined external data symbol foo, we create global import called foo but we don't actually use it at anywhere? That seems even more confusing since users would naturally assume that supplying foo at runtime would have the expected effect.

sunfishcode commented 3 years ago

I'm still not sure what I think overall, but it does feel like there's a coherent design in this. It'd be a const global import, so users wouldn't expect to be able to provide different values at runtime. And if the user provides globals with static absolute inits, then we use those inits as the values to patch into the relocations. Otherwise the link fails with an error. It'd be like those imports really are importing the value, just like PIC objects do, except we're just pre-comitting to doing an optimization with that value, and failing in the case where it doesn't work out in the end.

WebAssembly / tool-conventions

Using wasm globals to represent data/function addresses in object files #153