WebAssembly / tool-conventions

Conventions supporting interoperatibility between tools working with WebAssembly.
Artistic License 2.0
302 stars 67 forks source link

Linking.md: Use multiple data and code sections #138

Open sbc100 opened 4 years ago

sbc100 commented 4 years ago

I'd like to propose that we move towards using multple data and code sections in the object format.

This matches llvm's internal ideas about what a section is. Today if you iterate through section in an object you will only see a single code section, even though we default -ffunction-section. This mean the linker is forced then break down the monolithic code and section sections in sub-sections.

There are bugs popping up due to the fact that we dont currently map llvm's concept of a section onto a wasm section: https://reviews.llvm.org/D74531

There is a wasm proposal out of make repeated sections a valid thing: https://github.com/WebAssembly/conditional-sections.

The fact that we can currently validate wasm object files with tools like wasm-validate is feature I don't want to loose, so such tools would need to learn about conditional sections (at at least the multi-section part of it) before we would want to enable this by default.

dschuff commented 4 years ago

I think this makes sense. IIUC the current state is that object files can't be loaded without being relocated (i.e. they can't run correctly) but they do validate, right? We could preserve that property by just declaring that they use the conditional-sections proposal, and that any tool that wants to process them has to support that proposal (and of course those tools would still maintain "mvp" object file support). I think that also means we can do it as soon as the proposal is stable enough and supported by tools; we don't necessarily have to wait until all the browsers support it (as long as we're comfortable with "shipping" before stage 4, at risk of having to break compatibility or maintain extra hacks if things change).

aardappel commented 4 years ago

There's currently a lot of tools that will let you look at the contents of a .o even though they don't understand that it is different in some way from a regular .wasm, it be a shame to have all those stop working. So we'd have to make an effort to fix all of them. We're not the authors of all of them :)

Also, I am not following what information is gained by putting a function in a code section by itself, since a code section carries no information other than.. its size? Seems to me the linking data referring to segments of a code section or to a whole code section would be entirely equivalent, what am I missing?

sbc100 commented 4 years ago

You are correct its very useful that many tools can inspect object files. Requiring those tools to be aware of the multi-sections thing is (as far as I can tell) the main/only downside to this change.

But I think its worth it. Aside from binaryen and wabt how many other object inspection tools are there out there? If its only one or two then I'm certainly prepared to do the work on them too.

The benefits are mostly for consistency and simplicity of internal representation within llvm. There are two primary places I'm thinking about:

  1. Any tool that used llvm's libObjectFile to iterate through section. We expect each function to be in its own section since wasm is always -ffunction-sections. If I have 3 functions I expect to see 3 code sections the objdump output.

  2. The linker works on the granularity of sections. We currently subdivide the data and code sections in subsections (that we call "chunks" in the current wasm-ld code) in order to work around this.

Also the motivating issue: https://reviews.llvm.org/D74531. Here clang is expecting the ast to live in its own "section", but in the current model data sections are not modeled as section at all but segments (sub-sections of the data section which llvm tools don't know about).

tlively commented 4 years ago

There is precedent for requiring tools to implement stage 3 proposals to read object files: all object files currently contain a data count section whether or not bulk memory is enabled for their contents. So I think requiring tools to implement a proposal to continue reading object files is acceptable, as long as that proposal is reasonably stable and we are confident that it will eventually be standardized. I would not say the conditional sections proposal is quite there yet.