WebAssembly / binaryen

Optimizer and compiler/toolchain library for WebAssembly
Apache License 2.0
7.5k stars 745 forks source link

[wasm-as/wasm-dis]: Don't say the input/output is WebAssembly Text because it's not #4550

Open ThePuzzlemaker opened 2 years ago

ThePuzzlemaker commented 2 years ago

The s-expression language used by wasm-as and wasm-dis is non-compliant. It's not even just a subset, it's simultaneously a subset and a superset. So using it with compliant WAT code with certain extensions is impossible. So either we should not call it .wat/WebAssembly text, or make it compliant.

Examples

I will present a few examples to motivate this.

Source order

All memory segments must be before data segments, in source location

$ cat memory-before-data.wat
(module
  (data (i32.const 0) "Hello, world!\n")
  (memory 1))
$ wasm-as memory-before-data.wat
[parse exception: data but no memory (at 2:2)]Fatal: error in parsing input
$ wat2wasm memory-before-data.wat
$ cat memory-before-data.fixed.wat
(module
  (memory 1)
  (data (i32.const 0) "Hello, world!\n"))
$ wasm-as memory-before-data.fixed.wat
$ wat2wasm memory-before-data.fixed.wat

All globals must be defined before they are used, with respect to source location

$ cat global-before-use.wat
(module
  (func $foo (result i32) (global.get $global))
  (global $global i32 (i32.const 0)))
$ wasm-as global-before-use.wat
[parse exception: bad global.get name (at 2:26)]Fatal: error in parsing input
$ wat2wasm global-before-use.wat
$ cat global-before-use.fixed.wat
(module
  (global $global i32 (i32.const 0))
  (func $foo (result i32) (global.get $global)))
$ wasm-as global-before-use.fixed.wat
$ wat2wasm global-before-use.fixed.wat

Explanation

As defined in 6.6.13 of the WASM spec, "a module consists of a sequence of fields that can occur in any order" (180), excluding regarding composition of modules, where "all imports must occur before any regular definition of a function, table, memory or global, thereby maintaining the ordering of the respective index spaces" (181).

This is not an example of a syntactical limitation allowed as defined in section 7.2.1.

Folded instructions without correct number of arguments, but where those arguments are present on the stack

For example, consider the following function:

f(x) :=
    x = x + 1
    x + 5

While this is a bit contrived, I have come across this example in regular use. One may translate this imperative pseudocode into the following WAT:

$ cat folded-instrs.wat
(module
  (func $foo (param $x i32) (result i32)
    (local.tee $x (i32.add (local.get $x) (i32.const 1)))
    (i32.add (i32.const 5))))
$ wasm-as folded-instrs.wat
[parse exception: expected more elements in list (at 4:4)]Fatal: error in parsing input
$ wat2wasm folded-instrs.wat

There is no way to fix this within the input format of wasm-as, meaning using local.tee in any position but within another folded expression is not allowed. This also means that using multiple returns in a compliant way is impossible (contrived, but minimal example):

$ cat multiple-returns-folded.wat
(module
  (func $foo (result i32 i32)
    (i32.const 10)
    (i32.const 15))
  (func $bar (result i32)
    (i32.add (call $foo))))
$ wasm-as multiple-returns-folded.wat                                                               
[parse exception: expected more elements in list (at 6:4)]Fatal: error in parsing input
$ wat2wasm multiple-returns-folded.wat

Note the following: as defined in section 6.5.10, "folded instructions are solely syntactic sugar, no additional syntactic or type-based checking is implied". Also, instructions in the folded form are defined as such: '(' plaininstr foldedinstr* ')'. The usage of * in foldedinstr implies that the number of elements does not have to "align" with the instruction used, as no additional syntactic checking is to be done (174).

This is not an example of a syntactical limitation allowed as defined in section 7.2.1.

This means that the only way to do multiple returns is to use the tuple syntax, which is not a part of WebAssembly Text, which I would generally consider okay, as it would be a superset. However, as this is the only way (that I can find) to use multiple returns in this s-expression language (and is the way shown by wasm-dis), I consider this to be another example of noncompliance.

Flat form is not supported.

This one is somewhat self-explanatory. Only the s-expression format of WAT is supported by wasm-as/wasm-dis, not the "flat"/non-folded format that is also supported by tools such as wabt.

This is not as much of a problem as the previous examples, but again, it is not example of a syntactical limitation allowed as defined in section 7.2.1.

Notes

All section and page references are relative to the section and page numbers shown on the current (Release 1.1, Draft 2022-03-21) WebAssembly Specification PDF.

Conclusion

Yes, this is pedantic. But I find that this seems like a terrible decision to make, as binaryen and its tooling is used by several important tools, such as emscripten. While I don't know if wasm-as/wasm-dis are specifically used anywhere, it still seems counterintuitive to have a tool that seems "officially" developed by the WebAssembly group (as a repository in its GitHub organization) to not actually support compliant examples of WAT.

kripken commented 2 years ago

As history, Binaryen began identical to the wasm text format but then the wasm text format (and wasm itself) changed in various ways late in the 1.0 design process, like becoming a stack machine. So there was no point at which Binaryen made a decision here: this is a combination of wasm changes + intertia, basically. (We did decide not to add support for certain changes as they happened, but again, I'd say that was mostly inertia and not a proper design process.)

Note that all the differences should be documented here: https://github.com/WebAssembly/binaryen#binaryen-ir So I believe we already state in our docs that we don't support the full wasm text format? Maybe that could be more prominent.

Some of the issues you mention are simply bugs, like the source order stuff. It would be great to fix those!

I think the folded-but-on-the-stack issue you mention could also be fixed and we can consider it a bug, as supporting it would have no downsides. Again, it would be nice to fix that if someone has the time.

There are other differences that are not bugs, however, like Binaryen having an unreachable type, which allows more things to be written than wasm text supports. But I think that can be considered an extension. (And we don't want to remove it, as it allows convenient testing of Binaryen IR.)

ThePuzzlemaker commented 2 years ago

Yeah, I absolutely understand if we want it to support a superset of WASM. But having it somewhat a subset and a superset at the same time is kinda weird in my opinion. I'm wondering this, though: if you can convert to WASM <-> binaryen IR, why not convert from WAT <-> binaryen IR in the same/a similar way?

Also, while the limitations of binaryen IR are definitely documented, it's not clear that these limitations also refer to the text format, and thus the text format isn't "actually" WAT, more just an S-expr language to write binaryen IR-ish code (modulo webassembly)

kripken commented 2 years ago

We could convert wat to binaryen IR like we do for the binary format, but that causes changes. For example it does not perfectly roundtrip. We currently use the text format in order to test binaryen IR, where we need properties like roundtripping. So we'd need to support two text formats if we wanted to fully support the official text format.

That's a possibility, but in general we leave full spec compliance stuff to wabt, while here we focus on optimization.

If the docs aren't clear enough I definitely agree they are worth improving.