WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.39k stars 726 forks source link

wasm-link? #2767

Open kripken opened 4 years ago

kripken commented 4 years ago

In the past we had wasm-merge and then removed it for lack of use cases and the belief that lld would generally be the linker for everything. However, the need for something like wasm-merge has come up since then more than once, and I think I have an idea for a new design and goal for it.

Goal

Do the same things conceptually that a wasm VM would do at the "link" stage, but as a toolchain tool at compile time. That is, instead of shipping multiple wasm files and linking them at runtime on the client, you can link them at compile time. Hence I suggest the name wasm-link.

edit: To clarify, what I mean by "what a wasm VM would do" is "what would happen if a wasm VM were given a set of modules and names, and instantiated the first, then instantiated the second with the first imported with its name, then instantiated the third with the first two imported, and so forth." That is, the most simple and naive linking possible of wasm modules, without any special things done in the middle by JS loader code or anything like that.

Concretely, that means:

(module ;; imported with some other module name (import "one" "foo" (func $foo)) (func $call-to-foo (call $foo) ) )

== linked to =>

(module (func $internal_foo ..) (func $call-to-foo (call $internal_foo) ) )


* In the future interface types will give us a lot more to do here, like fuse lifting/lowering code etc.
* Note that if more than one input module has a memory or a table, multi-memory/multi-table are necessary, we need to handle duplicate internal names, etc.

## Use cases

* The specific use case I have myself atm is I want to explore replacing JS glue code in Emscripten with wasm code using reftypes. For now I'm experimenting with handwritten wat files, and I need a way to link those in. 
* A JS bundler does similar things to JS, and I remember plans to do the same for wasm files there. I'm not sure if that's been done in any of them or not (lack of multiple memories likely prevents it so far). I think it would be good to have a simple standalone tool in binaryen for this, that could perhaps be used by bundlers if they want.
* Linking code from different toolchains. Normally wasm object files and wasm-ld are what we want, but not all toolchains use wasm object files, like Go and AssemblyScript. And a single web page may contain code from multiple toolchains, so such bundling makes sense.
* Optimizing fused lifting/lowering code from interface types. I imagine there will be cases where the fused code doesn't just get optimized away by design, and there is actual work to be done. By linking at compile time we can do more optimizations (more complex ones, ones slower to run, etc.) than the VM would do at runtime.
tlively commented 4 years ago

SGTM. Agree this will be useful, especially as IT and multiple memories are implemented. Do we know how this might relate (or not) to @lukewagner's linking proposal?

The specific use case I have myself atm is I want to explore replacing JS glue code in Emscripten with wasm code using reftypes. For now I'm experimenting with handwritten wat files, and I need a way to link those in.

This is somewhat separate, but have you considered using .s files written in the LLVM assembly format? It would be good to dogfood using that format if we could, because that's how we expect end users of LLVM-based toolchains to incorporate hand-written wasm at the moment. I would be happy to help get the instructions you need into the LLVM backend so they can be used in the .s format.

kripken commented 4 years ago

@tlively Yeah, @dschuff mentioned .s files to me offline earlier - it's one of the options I'll look into. It's slightly less convenient atm since I am building a project normally to a final wasm, and then modifying the output wasm file and nothing else. But .s files may end up better later.

sbc100 commented 4 years ago

Would it be useful (as well) to be able to merge two wasm files into one without actually linking any of their imports or exports? (such a tool might more logically be called wasm-merge I guess :)

This would require import/export name re-writing to avoid collision, but would be "no-link" mode that would maintain the trust boundary between the two modules. A runtime linker could then decide if it really want to connect the two modules or keep them completely isolated.

kripken commented 4 years ago

@sbc100 Hmm, then maybe there could be two tools, wasm-merge which just "concatenates" but nothing more, and wasm-link which takes a single wasm module and does internal linking?

One issue with that is that I'm not sure it's enough for the second tool wasm-link to take a single module as the argument. For example if we are asked to merge two modules with the same export name, what would wasm-merge do? However it disambiguates that, wasm-link would need to be aware of the original modules names and so forth (which it would if it did the concatenation itself). Likewise, if we merge modules with interface types, what would the output of wasm-merge look like for wasm-link to optimize?

So maybe it should just be a single tool, with an option of doing the internal linking or not.

sbc100 commented 4 years ago

Yes, I was thinking "internally-link" vs "no-linking" as two different modes of the same tool.

kripken commented 4 years ago

How about wasm-link --no-internal for the mode without internal linking?

tlively commented 4 years ago

I would prefer wasm-merge --link or wasm-link --no-link, which I think is more descriptive than --no-internal, but I don't feel strongly about it.

sbc100 commented 4 years ago

We could also have wasm-link and wasm-merge be the same binary like clang and clang++, but this discussion is really putting the shed before the bike.

LouisStAmour commented 4 years ago

Not sure if this is an active discussion, but I was hoping I'd be able to write a WASM Envoy module following https://github.com/proxy-wasm that would in turn be able to use (merge with?) wasm functions from https://www.openpolicyagent.org/docs/latest/wasm/ to avoid having to make extra network calls for each policy decision.

But I'm having trouble figuring out how :) The closest I can see might be SIDE_MODULE support if I want to try and keep dynamic linking of one WASM for another. But it's unclear which implementations support "side modules" and I should probably make the assumption that they're not supported by Envoy. If so, everything has to be merged into one module, and thus at build time of one wasm, I'd have to merge/wrap it in functionality to produce another wasm.

Both wasm files have different export function name prefixes, so from the outside this sounds like an easy thing to do, but ... it's unclear to me how often folks have done this or where to get started. It's a bit odd to me that WASM can support so many different languages as source targets, but seemingly not statically linking or calling wasm itself from such languages when producing wasm as an output. :)

sbc100 commented 4 years ago

Is there any reason not to just use the static linker (wasm-ld) in this case? The only limitation with the static linker is that the inputs have to be wasm object files .. is that not poassible?

Can you not build your two programs a ar archives (.a files) or just object files (you can use wasm-ld --relocatable to combine many object files into one if that helps) and then use wasm-ld to link them together.

LouisStAmour commented 4 years ago

I guess I’ll have to investigate... I’m relatively new to uses of wasm outside the browser. In the case of OPA, the source for the .wasm it produces appears to be Go, so it’s unclear how compatible the .o file from Go would be and how to produce a wasm-compatible object file. I suppose I’ll have to experiment...

LouisStAmour commented 4 years ago

My use case will have to wait a bit to be practical it seems. After further investigation, https://github.com/envoyproxy/envoy-wasm is still under active development including a new ABI and remote deployment mechanisms with Istio also improving. I'll prototype without WASM on my end (using network calls instead) and revisit this later.

pannous commented 3 years ago

@sbc100 the static linker wasm-ld requires elaborate custom linking sections and did not produce the desired output for me.

While renumbering the functions, updating the call indices apparently can break: "R_WASM_FUNCTION_INDEX_LEB relocations may fail to be processed, in which case linking fails."

Generalized linking is also part of the wasm roadmap

sbc100 commented 3 years ago

@sbc100 the static linker wasm-ld requires elaborate custom linking sections and did not produce the desired output for me.

If there are bugs in wasm-ld then we want to know about them. So far wasi-sdk and emscripten and rust all use wasm-ld and there are no known major outstanding issues. Of course they all use llvm so it easy for them to produce the object file metadata. Perhaps you can describe what you are trying to do and why its not working? For sure wasm-ld requires extra metadata, but that is because the wasm module system is not powerful enough to express everything that a static linker needs. For example, we want data segments to be statically relocatable which requires relocation information. We also want lld to be fast which means we don't want to pay the cost of disassembly all the instruction to find all the function call sites (another require for requiring relocation information).

While renumbering the functions, updating the call indices apparently can break: "R_WASM_FUNCTION_INDEX_LEB relocations may fail to be processed, in which case linking fails."

Are you running into this specific issue regarding weak symbols?

Generalized linking is also part of the wasm roadmap

pannous commented 3 years ago

Nevermind, we resolved it by removing parameters from the linker:
clang --target=wasm32 -nostdlib -Wl,--export-all,--relocatable,--no-entry,--shared -o lib.wasm lib.c => clang --target=wasm32 -nostdlib -Wl,--relocatable -o lib.wasm lib.c

Trying to link a minimal example https://github.com/pannous/test-lld-wasm now works.

abrown commented 3 years ago

I had a need for the tool this issue would provide and was discussing it over in the AssemblyScript repository: https://github.com/AssemblyScript/assemblyscript/issues/2045. I floated an idea for a static, naive linker (like wasm-link?) that would not require the relocation sections but @dcodeIO mentioned that adding Wasm object file support to Binaryen might be another option. Is there any strong preference here between:

srenatus commented 3 years ago

I think this might be relevant but want mentioned before: https://github.com/bytecodealliance/witx-bindgen/tree/main/crates/wasmlink

kripken commented 3 years ago

@abrown

I think building wasm-link would be pretty straightforward. I don't have an urgent need for this myself so I'm not planning to work on it soon, but I'd be very happy to review a PR for it!

Adding object file support would be significantly more work, as the relocations require IR changes. I actually wrote some notes on this a while back, and I'm not sure where I posted them, but attached is a PDF. Binaryen Object File Support_.pdf

@srenatus Thanks for the link! Looks like that is focused on Module Linking and Interface Types, but perhaps it could be reused here - I'd expect we'd need to emit Module Linking logic for that linker to process, though, which might be more work than wasm-link itself. But it might be worth discussing with the devs there.

dbanks12 commented 1 year ago

Has there been any progress here? If I have one project/library and I want to compile it to later be ingested by a dependent project with wasm bindings (functions with "default" visibility) of both the library and dependent project exposed, what should my process be? Would I use wasm-ld as mentioned above?

Apologies if my terminology is poor, I am a wasm noob.

sbc100 commented 1 year ago

I think that sounds like a use case for normal emcc/wasm-ld style linking and use of wasm object files or libraries of wasm object files. Can your library be build as a library of object files?

dbanks12 commented 1 year ago

I think that sounds like a use case for normal emcc/wasm-ld style linking and use of wasm object files or libraries of wasm object files. Can your library be build as a library of object files?

I am trying to build a library of object files, but struggling to do so. I am using wasi-sdk and cmake, and cannot figure out how to export wasm object files to be used by my dependent project.

Sorry, I am not sure where the right place is to have this conversation, but it might not be here. Please let me know if there is a better place for me to get help on this! Thanks.

pannous commented 1 year ago

Until such a tool resurfaces again you can try the steps here:

https://github.com/pannous/test-lld-wasm

If the object binaries contain a relocation section OR are internally relocatable (by having nop spacers around calls and loads) you can also try wasp main.wasm lib.wasm

sbc100 commented 1 year ago

I think that sounds like a use case for normal emcc/wasm-ld style linking and use of wasm object files or libraries of wasm object files. Can your library be build as a library of object files?

I am trying to build a library of object files, but struggling to do so. I am using wasi-sdk and cmake, and cannot figure out how to export wasm object files to be used by my dependent project.

Sorry, I am not sure where the right place is to have this conversation, but it might not be here. Please let me know if there is a better place for me to get help on this! Thanks.

Simply using cmake's normal static library construct should work fine. All static libraries are libraries of object files.

tonibofarull commented 1 year ago

Compile the library with wasi-sdk as well!

Example CMakeLists.txt

cmake_minimum_required(VERSION 2.8.12)

project(lib)

set(CMAKE_C_COMPILER    "/opt/wasi-sdk/bin/clang")
set(CMAKE_CXX_COMPILER  "/opt/wasi-sdk/bin/clang++")

add_library(lib lib.c)

And the following code for combining the library with main.c,

/opt/wasi-sdk/bin/ranlib lib/build/liblib.a

/opt/wasi-sdk/bin/clang \
    -Wl,--allow-undefined \
    -o main.wasm main.c lib/build/liblib.a \
    -I./lib

Running ranlib is needed, otherwise,

wasm-ld: error: lib/build/liblib.a: archive has no index; run ranlib to add one clang-14: error: linker command failed with exit code 1 (use -v to see invocation)

Let me know if you need sample files.

sbc100 commented 1 year ago

You also need to override the AR and RANLIB tools, not just C/CXX_COMPILER. wasi-sdk has a toolchain file that does this so you shouldn't need to: https://github.com/WebAssembly/wasi-sdk/blob/cee312d6d0561f302d79f432135bd2662d17862d/wasi-sdk.cmake#L17-L22

tonibofarull commented 1 year ago

You also need to override the AR and RANLIB tools, not just C/CXX_COMPILER. wasi-sdk has a toolchain file that does this so you shouldn't need to: https://github.com/WebAssembly/wasi-sdk/blob/cee312d6d0561f302d79f432135bd2662d17862d/wasi-sdk.cmake#L17-L22

Quite useful, thanks!