Open sbc100 opened 3 years ago
Could be either an upside or a downside: This might motivate us to actually make linker relaxation work which could have other benefits.
@sunfishcode @dschuff @tlively
I particularly like that this would give data symbols names in the name section, but I am concerned about depending more on wasm-opt for code size and performance. Is it correct that implementing linker relaxation would essentially negate any simplicity benefits we would get from this? Also, what additional benefits would linker relaxation bring?
Performance-wise I imagine that global.get <immutable-const-non-imported-global>
would be compiled down to the same thing as i32.const
anyway right? I that is correct assumption then really just a code size question.
Is it correct that implementing linker relaxation would essentially negate any simplicity benefits we would get from this?
Yes, I guess point (1) doesn't really hold up if we end up doing linker relaxation in response to this change. The other 3 arguments stronger than that one anyway I think.
Regarding linker relaxation, one other example might be code that was compiled with TLS but then linked without --shared-memory
.
It would imagine it could transform the following pattern/relocation:
global.get __tls_base
i32.const <tls_relocation>
i32.add
Into just:
i32.const <static address>
Performance-wise I imagine that
global.get <immutable-const-non-imported-global>
would be compiled down to the same thing asi32.const
anyway right? I that is correct assumption then really just a code size question.Is it correct that implementing linker relaxation would essentially negate any simplicity benefits we would get from this?
Yes, I guess point (1) doesn't really hold up if we end up doing linker relaxation in response to this change. The other 3 arguments stronger than that one anyway I think.
Well it would simplify the compiler, and move the complexity to the linker. It could still be a benefit if the e.g. compiler needed to have different codepaths in more than one place, whereas the linker can just have a different codepath in only one place? Not sure if that's actually the case though.
Regarding linker relaxation, one other example might be code that was compiled with TLS but then linked without
--shared-memory
.It would imagine it could transform the following pattern/relocation:
global.get __tls_base i32.const <tls_relocation> i32.add
Into just:
i32.const <static address>
Traditional linker relaxation basically just rewrites code in place (leaving nop padding) rather than actually shrinking anything, presumably for speed and simplicity. Are we imagining that as a possibility too? Or are we already rewriting everything in the linker to get smaller LEBs? I'm guessing not, actually. Because doing that would invalidate debug info, right?
We do have an (off by default) option called --compress-relocations
which can reclaim space, but we don't use it, or eve recommend using it. So yes you are right we won't save space by default adding linker relaxation.
So it sounds like there is no upside to (1), and that we don't need we will gain much from linker relaxation. But I think the other arguments for doing this still stand.
At first glance, it looks like this should be compatible with module linking; does that sound right?
Objdump has a -r option which can show the relocations interspersed with the disassembly, which is useful for other kinds of relocations as well. It seems like it wouldn't be bad if the other disassemblers people use could do this too.
Like @dschuff I also wonder how much linker relaxation affects linking speed. A variant of this proposal would be to emit imports for globals, but continue to codegen addresses as i32.const
with a relocation instead of a global.get
(in non-PIC modes). That would make it more natural for --allow-undefined
to be similar between functions and data, but wouldn't require linker relaxation to optimize away the global.get
s. If linker relaxation has a significant impact on linking speed, this hybrid approach may help.
Like @dschuff I also wonder how much linker relaxation affects linking speed. A variant of this proposal would be to emit imports for globals, but continue to codegen addresses as
i32.const
with a relocation instead of aglobal.get
(in non-PIC modes).
I'm having trouble understanding what you are suggesting. Why emit imports for globals at all if we are going to generate i32.const
for their addresses? Where/How would these imports be used?
I'm having trouble understanding what you are suggesting. Why emit imports for globals at all if we are going to generate
i32.const
for their addresses? Where/How would these imports be used?
It'd just arrange for all undefined symbols have imports, which I imagine would make it easier to think about things like --allow-undefined
, even though that's not the only way it could work. I'm not attached to the idea.
Sorry, are you suggesting that, given and undefined external data symbol foo
, we create global import called foo
but we don't actually use it at anywhere? That seems even more confusing since users would naturally assume that supplying foo
at runtime would have the expected effect.
I'm still not sure what I think overall, but it does feel like there's a coherent design in this. It'd be a const
global import, so users wouldn't expect to be able to provide different values at runtime. And if the user provides globals with static absolute inits, then we use those inits as the values to patch into the relocations. Otherwise the link fails with an error. It'd be like those imports really are importing the value, just like PIC objects do, except we're just pre-comitting to doing an optimization with that value, and failing in the case where it doesn't work out in the end.
;tldr; should we use globals hold data addresses, even in non-PIC object files.
The current tooling conventions for object files (and the default used by llvm) is to represent data and functions address only in relocations.
Taking the address or a data symbol or function symbol results in the following code:
Where the reloc exists only in the linking section and is either a
R_WASM_MEMORY_ADDRESS_LEB
(for data symbols) orR_WASM_FUNCTION_INDEX_LEB
(for function symbols).With the experimental PIC ABI used by emscripten these address are instead model as wasm globals and produces the following pattern.
In this case the relocation type is
R_WASM_GLOBAL_INDEX_LEB
.When linking object built with
-fPIC
into static binaries that linker creates internal immutable globals that represent the static address of the symbol. Because the global is internal and immutablewasm-opt
can then completely eliminate the global and replace theglobal.get
with ani32.const
. This can be though of as form of linker relaxation that happen in the post-link optimizer. With a little work we could teach wasm-ld to perform this relaxation directly.My suggestion is to use globals in similar fashion by default and even when
-fPIC
is not specified. Here are some of the advantages, as I see them:It simplifies llvm, having just one way to get symbol addresses.-fPIC
object need no longer be special)wasm2wat
andwasmdis
, whereas this information was previously hidden in the relocation data.--allow-undefined
doesn't do what most people think it does WRT to data symbols.Downsides: