WebAssembly / tool-conventions

Conventions supporting interoperatibility between tools working with WebAssembly.
Artistic License 2.0
298 stars 65 forks source link

Revisit the default of `-fvisibility=hidden` #176

Open sbc100 opened 2 years ago

sbc100 commented 2 years ago

;TLDR; visibility is currently only used by emscripten and emscripten uses a default of -fvisibility=default.

A long while ago we choose to make -fvisibility=hidden the default for the WebAssembly backend in llvm. IIRC the rational was that we would want to avoid exporting a lot of C/C++ symbols by default.

However, the only time the wasm-ld linker actually exports things based on their visibility is when -shared is used (or in the static linking case when --export-dynamic is passed). These flags are not currently used outside of emscripten. For example, with wasi-sdk users are currently things via on of 3 different methods:

  1. Explicit --export on the command line (e.g. --export=foo)
  2. --export-all on the command line. This is even worse that exporting non-hidden symbols since it exports everything (do we even what to support this going forward?)
  3. export_name attribute in the source code.

In emscripten, the visibility attribute is honored when -shared/-pie is used, and we recently switch from using --export-all to --export-dynamic + -fvisibility=default which results in fewer exports (for example --export-all exports all libc-internal symbols , even those marked as hidden, whereas --export-dynamic + -fvisibility=default does not). See https://github.com/emscripten-core/emscripten/pull/15413.

sbc100 commented 2 years ago

@sunfishcode

sunfishcode commented 2 years ago

My top goal, above all else here, is to make sure wasi-sdk doesn't get boxed in by Emscripten.

I don't know how visibility=default is going to work in wasi-sdk. I'd like a chance to design it, rather than getting boxed in by arguments about how things like --export-dynamic work. --export-dynamic wasn't even supposed to be available in wasi-sdk, but it got enabled in wasi-sdk because it was added to wasm-ld for Emscripten, and you pushed back against linker flags to add an Emscripten mode, and then people assumed they were supposed to use it, and now we're stuck supporting it.

So what I'd like is a chance to talk about how we actually want dynamic linking to work, rather than just the shortest path to help whatever use case you're working on.

sbc100 commented 2 years ago

Yes I agree that its a shame the --export-dynamic does anything today with wasi-sdk. Without that flag I guess we could defer the decision about visibility=default since visibility would mean nothing to wasi-sdk. Is that right?

Are wasi-sdk users using that flag do you know? Is it too late to put it behind --experimental-pic?

If we can get to a place where visibility does not effect wasi-sdk would you be OK with making the llvm default match the emscripten use case? Then wasi-sdk could choose a different default if/when it want to start using visibility?

sunfishcode commented 2 years ago

Yes, people are using --export-dynamic and it's too late to put it behind --experimental-pic, at least, not without a good reason.

The experience of everyone I know who worked on ELF dynamic linking is that startup times for large applications with many dynamic libraries is slow. Startup times are something we're all pretty sensitive to in wasm land, both in Web and non-Web. Two of the big "lessons learned" were that one-level namespacing and -fvisibility=default are two of the big reasons for this slowness.

Do we want two-level namespacing and visibility=hidden in wasm? I don't know! But I haven't seen any discussion of the motivations for those choices. It's just, this is what ELF did (and folklore has it that these were dubious choices even then), and so it's the fastest path to get things ported to wasm, and that seems to be the extent of it.

sbc100 commented 2 years ago

Yes, IIUC in emscripten the decision thus far has been to go with compatibility with ELF by default. There exist options to narrow down the shared library exports, but by default you get all non-hidden symbols exported.

For the time being I think I will submit a patch to llvm to just change the default for emscripten and then other sub-targets such as wasi can then make their own decisions later/separately.

I think the two-level vs one-level discussion is largely separate from this one, right? i.e. I would hope that emscripten could switch to two-level namespaces without break as much existing code (IIUC, it mostly effects librarys that want to override symbols in the global namespace, LD_PRELOAD, etc).

kripken commented 2 years ago

@sunfishcode

So what I'd like is a chance to talk about how we actually want dynamic linking to work

To add to that, I think it would be great if we can design something that works in both toolchains (and hopefully others) so that we reduce differences across the ecosystem. That could help people shipping code using multiple toolchains.

@sbc100

For the time being I think I will submit a patch to llvm to just change the default for emscripten and then other sub-targets such as wasi can then make their own decisions later/separately.

I'm ok with such changes if they are useful for now. But I do think figuring out a shared story would be the right long-term plan, when the time is right for that (sounds like maybe not yet).

sbc100 commented 2 years ago

I'm ok with such changes if they are useful for now. But I do think figuring out a shared story would be the right long-term plan, when the time is right for that (sounds like maybe not yet).

Agreed, it would be great to have a shared story on the meaning and default of visibility but that might have to wait for wasi-sdk or some other toolchain to implement dynamic linking. For now, this llvm change just documents the status quo under emscripten: https://reviews.llvm.org/D113435

sunfishcode commented 2 years ago

You've also said it'd be great if Emscripten and wasi-sdk could share more. Yet the PR here is taking yet another step towards hard-coding Emscripten knowledge in LLVM and making the two different. It may be that, like the old goal of modularizing Emscripten, that there's a difference between what you'll say would be great and what you're actually working on.

We already do have a way for Emscripten to get -fvisibility=default, if encouraging the wasm ecosystem to grow up around that as a default is what you want to do. Is it necessary to embed that in LLVM too?

sbc100 commented 2 years ago

I don't think https://reviews.llvm.org/D113435 increases the divergence, I'm just trying to clarify/document the current divergence.

Having the defaults for wasm32-unknown-emscripten encoded is llvm rather than constantly overriding them in the python driver of emscripten I believe is a separate and useful goal which helps to simplify emscripten.

sunfishcode commented 2 years ago

We appear to have different priorities, and I don't know how to reconcile them.

D113435 being Emscripten-only means it won't directly impact my use cases, so I won't object to it.

sbc100 commented 2 years ago

Perhaps the main difference in priorities stems from the fact in emscripten we want to support the existing users of shared libraries that we already have today? And you would prefer to wait and design something more standards-based? Is that about right?

I still think we can work towards want you want and collectively decide that the better solution to WebAssemebly dynamic linking, and that could involve hiding symbols by default or not.

But given that emscripten's current default is -fvisitbility=default I do think its better to encode this in llvm (as opposed to overriding it all the time). We can still change this default if we decide its the wrong default. It might be easy, but that is already the case today.

dschuff commented 2 years ago

Regarding the actual desired design of dynamic linking, I actually do think we are getting far enough along in the implementation where we really should start actually asking and answering the design questions, rather than only extending support for the existing dylib system. I do think the work we have done and are doing now is helping us understand what some of the options and constraints are, and has been helpful (compared with doing a full greenfield design upfront); and on top of that, much of it is necessarily web-specific, such as how the loader and library-opening APIs have to work with web workers.

But I wouldn't want to be in a place where we've extended and evolved the SIDE_MODULE system so that it's "good enough" that it starts to get wide usage; then we really would be risking boxing in the non-emscripten use cases (leading to divergence that would be really hard to undo) and for that matter it would even get hard for us to replace that system with something else just in Emscripten. For another example, making the command-line UI more like ELF or Mac tools is good in principle, but if it keeps the same semantics as SIDE_MODULE then it will be that much harder to change later.

So maybe we should actually start discussing symbol visibility here, and start another thread on namespaces too?

dschuff commented 2 years ago

I think the two-level vs one-level discussion is largely separate from this one, right? i.e. I would hope that emscripten could switch to two-level namespaces without break as much existing code (IIUC, it mostly effects librarys that want to override symbols in the global namespace, LD_PRELOAD, etc).

"How hard it is to switch from where we are now" should probably also be something we start thinking about very soon, here and in the 2-level namespace discussion.