WebAssembly / wabt

The WebAssembly Binary Toolkit
Apache License 2.0
6.89k stars 700 forks source link

wat2wasm fails with call_indirect and relocation #1199

Closed skalogryz closed 1 year ago

skalogryz commented 5 years ago

The following code fails to compile by wat2wasm (v1.12) if relocatable binary (-r) is requested. Is it expected? (exit code 0xC0000005) Non-relocatable binary compiles just fine.

(module 
  ;; (type (func (param i32)(param i32)(result i32))) ;; fails, even if present
  (func $add (param i32)(param i32)(result i32)
    get_local 0
    get_local 1
    i32.add
    return
  )
  (table 1 anyfunc)
  (elem (i32.const 0) $add)
  (func $run
    i32.const 4444
    i32.const 5555
    i32.const 0
    ;;call_indirect (type 0) ;; fails
    call_indirect (param i32)(param i32)(result i32) ;; fails
    drop
  )
  (export "run" (func $run))
)
binji commented 5 years ago

Looks like it's not supported. I added an error message for now, so at least it won't crash: https://github.com/WebAssembly/wabt/pull/1202.

skalogryz commented 5 years ago

ok, thanks... any plans for the implementation?

sbc100 commented 5 years ago

For sure we should implement this. Out of interest what is your use case for the -r flag.. I added while I was developing the wasm object format but I'm curious to know if its actually useful in wabt?

skalogryz commented 5 years ago

but I'm curious to know if its actually useful in wabt?

i'm adding wasm target to freepascal compiler (https://github.com/skalogryz/freepascal/tree/webasm) (https://lists.freepascal.org/pipermail/fpc-pascal/2019-September/056848.html)

The compiler internals make it hard to use tree-structure code of binaryen's wasm-as. That makes wat2wasm the best option so far.

the pascal is modular. Each compiled wasm object file needs to be linked in the end. (static) linking requires the use of -r flag. (The relocation information generated by the flag worked more or less well for lld, with a little help of an additional tool for some cases)

as for call_indirect specifically. It's (supposed to be) used to implement function pointer calls and virtual methods.

sbc100 commented 5 years ago

I see. So your compiler outputs .wat, then you use wat2wasm -r to make your object files, then you link your object files with lld.

Sounds like you are the first real user of wat2wasm -r and as such you may run into more difficulties down the line. For example, there is no way that I know of to include data symbols or data relocations in the .wat format.

@aardappel recently added support for his lobster language, and as part of the he made a language agnostive C++ library (a header-only library I believe) that can be used to directly generate relocatable wasm object files. @aardappel would this a good use for your library? If you use that you wouldn't need wabt at all I think.

skalogryz commented 5 years ago

For example, there is no way that I know of to include data symbols or data relocations in the .wat format.

so far, automatically generated symbols worked fine.

f you use that you wouldn't need wabt at all I think.

a disclaimer to use wabt? :)

sbc100 commented 5 years ago

Maybe you don't need data relocations in pascal? But this kind of relocation is currently not representable in wat format (at global scope):

int bar = 1;
int foo = &bar;

In a C toolchain this would generate 8 bytes of static data, the with a relocation (pointing to the symbol bar) in the second 4 bytes.

I'm happy to continue to support and improve -r support :) . I just worry that full support will require at least some amount of work on the wat format.

aardappel commented 5 years ago

@sbc100 looks like the FreePascal compiler is itself written in Pascal, so wholesale adoption of my C++ library is unlikely to work :)

@skalogryz that said, if you want to learn how to output linkable wasm object files that work with LLD, this C++ header is a small amount of self-contained code that writes such a binary, in particular see the function Finish at the end for the linking related code. More documentation on that file.

skalogryz commented 5 years ago

@aardappel thanks. I've actually already started a binary utility myself (https://github.com/skalogryz/wasmbin). As it was mentioned earlier there's no much over the symbol information through the .wat file, so I had to post process the data with the tool after the compilation. (that did help to prevent linking conflicts / errors)

FPC does allow support of internal object file writers. With wasm object format being relatively simple I was hoping to avoid going that path. (adding a compiler target is not the same as maintaining the target). It's more fun to use the new (wasm) features (i.e. threads, exceptions.. etc), rather than keeping implementing them (once they become the part of the standard) :)

skalogryz commented 5 years ago

But this kind of relocation is currently not representable in wat format (at global scope)

No, they don't. The problem is that effective address of bar is not relocatable. Yet the variable, that could hold the effective address is. (global $addrof_bar i32 (i32.const 0)) once linking is done, it's possible to "relocate" the effective address of 'bar' by changing the value of $addrof_bar constant (i.e. i32.const 0changes to i32.const 2342). (changing the value means - changing it in binary file). I wouldn't expect wat2wasm to do that though.

skalogryz commented 5 years ago

on the topic of call_indirect relocation.

per this document: https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md#merging-function-sections

There are currently two ways in which function indices are stored in the code section: ...

  1. Immediate argument of the i32.const instruction (taking the address of a function).

I don't think that follows WebAssembly text format convention. (the only argument of i32.const allowed is a numeric field and not function index). looking at the wat2wasm code, i didn't find any support for such feature either.

sbc100 commented 5 years ago

But this kind of relocation is currently not representable in wat format (at global scope)

No, they don't. The problem is that effective address of bar is not relocatable. Yet the variable, that could hold the effective address is. (global $addrof_bar i32 (i32.const 0)) once linking is done, it's possible to "relocate" the effective address of 'bar' by changing the value of $addrof_bar constant (i.e. i32.const 0changes to i32.const 2342). (changing the value means - changing it in binary file). I wouldn't expect wat2wasm to do that though.

But what you are describing is exactly what the linker is supposed to do. That is what relocations are for.

sbc100 commented 5 years ago

on the topic of call_indirect relocation.

per this document: https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md#merging-function-sections

There are currently two ways in which function indices are stored in the code section: ...

  1. Immediate argument of the i32.const instruction (taking the address of a function).

I don't think that follows WebAssembly text format convention. (the only argument of i32.const allowed is a numeric field and not function index). looking at the wat2wasm code, i didn't find any support for such feature either.

Right, this is why the -r flag isn't very useful today as a way to building linkable object files. llvm will generte such code correctly either from C/C++ source or you can write in the .s assembly format something like:

i32.const $foo

The assembler then creates a relocation after the i32.const and the linker resolves the relocation and puts foo in the wasm table to make it indirectly callable.

sbc100 commented 5 years ago

In short, the wat format as it stands today is not suitable as a general purpose assembly language. Its can't describe object files in the general case.

carlsmith commented 5 years ago

@sbc100 - I'm using WAT as a first-class assembly language for a fantasy console. There's a runtime with a memory-mapped API, but the user writes all of their code in WAT, and any of their modules can import whatever any other module exports. Do you foresee a problem with that?

sbc100 commented 5 years ago

The issue we are discussing here mostly refers to linking with other object files. if you are not using a linker then writing pure WAT is a lot more reasonable.

Once the linker gets involved you will, at a minimum, want to assign symbolic names to memory locations.

carlsmith commented 5 years ago

Thank you, @sbc100. Much appreciated.

binji commented 4 years ago

on the topic of call_indirect relocation. per this document: https://github.com/WebAssembly/tool-conventions/blob/master/Linking.md#merging-function-sections

There are currently two ways in which function indices are stored in the code section: ...

  1. Immediate argument of the i32.const instruction (taking the address of a function).

I don't think that follows WebAssembly text format convention. (the only argument of i32.const allowed is a numeric field and not function index). looking at the wat2wasm code, i didn't find any support for such feature either.

Right, this is why the -r flag isn't very useful today as a way to building linkable object files. llvm will generte such code correctly either from C/C++ source or you can write in the .s assembly format something like:

i32.const $foo

The assembler then creates a relocation after the i32.const and the linker resolves the relocation and puts foo in the wasm table to make it indirectly callable.

Agreed that the wat format doesn't support this well. However, since we have the -r flag currently, I wouldn't be opposed to adding functionality to the text format as long as it's not too invasive.

gwenya commented 4 years ago

Relocations for call_indirect (and func types in general) are being implemented in #1525 @binji I'd be interested in implementing something like the i32.const $foo syntax as well, but we probably also want the same for memory addresses of data sections in addition to function indices, but I'm not sure how those are relocated and how we can assign names to them.

binji commented 4 years ago

@sammax unfortunately, we can't really add text syntax like that, since wabt tries to match the spec (and future proposals). The best we can hope for is using annotations for this, perhaps something like:

i32.const (@reloc foo) 0

Memory addresses would require a similar annotation syntax. The wat-numeric-values proposal will help here too.

gwenya commented 4 years ago

Do you think a spec proposal to add named data segments and to resolve data segment names to their addresses would have any chance of being accepted?

Annotations would work too I guess, though I think we should keep them independent of relocations, since it would be useful to refer to memory locations by name even in non-relocated code.

binji commented 4 years ago

Maybe a proposal like that would be accepted. I think the big question is what is the expected output from a module using this text format. If we're saying that it creates relocation entries in the linking sectio, then I think that may be a problem, since the linking section format is not standardized (at least as far as the spec is concerned). We could say that it is like an annotation and can be interpreted as desired by the tools. But that seems a little unsatisfying too.

Annotations would work too I guess, though I think we should keep them independent of relocations, since it would be useful to refer to memory locations by name even in non-relocated code.

It's a little tricky to make that work, since data segments don't require a fixed offset. For example, you can write:

(import "env" "g" (global $g i32))
(data (global.get $g) "my data ...")

If we extend the syntax to allow named memory locations, then there's no way for that to be a simple i32.const. I suppose we could say that you can't refer to an address in that segment, though.

gwenya commented 4 years ago

I had been thinking that in normal mode it would work the same as in normal assembly files, i.e. just resolve to the address, and in relocatable mode it would additionally add the relocations. But that doesn't really work with data segments that use a non-const offset. Not allowing referring to such segments might work, or maybe not allowing naming such segments. We could also introduce some kind of pseudo instruction data.addr $name that gets replaced by the offset expression for that named segment, but I'm not really sure if that's a great idea. Annotations have the same problem, though maybe it would be more palatable to have an annotation that gets transformed into an expression than a pseudo-instruction. Doesn't this problem also exist for relocations though? How do offsets work when linking anyways? Are data segments with constant offsets relocated during linking or do they interfere with each other?

binji commented 4 years ago

We could also introduce some kind of pseudo instruction data.addr $name that gets replaced by the offset expression for that named segment, but I'm not really sure if that's a great idea.

Agreed, I don't think we'll want pseudo-instructions in the wat format. I guess I can image something like this though:

(data (i32.const 0)
  $label1
   "some data"
  $label2
  "some more data"
  ...
)
...
(i32.const data-addr=$label2)

Doesn't this problem also exist for relocations though? How do offsets work when linking anyways? Are data segments with constant offsets relocated during linking or do they interfere with each other?

I'm not sure about how this works, @sbc100 would know.

sbc100 commented 4 years ago

Doesn't this problem also exist for relocations though? How do offsets work when linking anyways? Are data segments with constant offsets relocated during linking or do they interfere with each other?

The linker assumes that it has ultimate control over the offset of all data segments. That is, I think it completely ignores the offsets specified in the input files.

Since the linker decides the data layout, all data references require relocations.

gwenya commented 4 years ago

@binji I like that syntax, I think we would also need table-elem= or something like it for table elements, and a way to name table elements. If you think it's a good idea, I can get on writing a proposal for it (once I figure out how that works).

The linker assumes that it has ultimate control over the offset of all data segments. That is, I think it completely ignores the offsets specified in the input files.

@sbc100 That is the case for both const and global offsets, but not for passive ones, right?

I think we could have labels/names for global-offset segments in a way that makes sense in non-relocatable mode too by making them resolve to an offset relative to the segment:

(data (global.get 0)
  $label1
   "some data"
  $label2
  "some more data"
  ...
)

In this example, $label1 would resolve to 0 and $label2 would resolve to 9, so that one could then write the following to get the address of "some more data":

(i32.add
  (global.get 0)
  (i32.const data-addr=$label2)))

This would only happen when the offset is a global, for const offsets the label would resolve to an absolute address calculated as <const offset of segment> + <offset of label within segment>.
I'm not completely happy about having different semantics based on whether the offset is const or global, but I can't think of a better way to solve it and I believe it provides enough value to be worth having it.

binji commented 4 years ago

@binji I like that syntax, I think we would also need table-elem= or something like it for table elements, and a way to name table elements. If you think it's a good idea, I can get on writing a proposal for it (once I figure out how that works).

Cool! Easiest way to start is to open an issue in WebAssembly/design with a little background, and a proposed syntax. Then we can discuss it a little bit there before bringing it to the group. This would be a similar kind of proposal to https://github.com/WebAssembly/design/issues/1348, since they both only affect the text format. So you could use that as inspiration.

gwenya commented 4 years ago

Cool! Easiest way to start is to open an issue in WebAssembly/design with a little background, and a proposed syntax. Then we can discuss it a little bit there before bringing it to the group. This would be a similar kind of proposal to WebAssembly/design#1348, since they both only affect the text format. So you could use that as inspiration.

Thanks, I'll take a look at that! I think I should be able to write something up some time this week.

keithw commented 1 year ago

Fixed in #1525