0xc block signature fallout

kripken commented 8 years ago

0xc has block signatures, which lead to some problems for binaryen, namely

We do many simple transformations like (X) => (block (Y) (X)) (replace a node with a block with some code that runs before the old node, then the old node after it). That's no longer valid in 0xc without looking outside, since the type of the block would depend on the scope it is in (see examples in that link). In other words, binaryen assumed we can type blocks in an encapsulated, local way, but wasm changed that. The outside context matters now.
In many cases issues with that can be fixed by dead code removal, as they involve dead code doing odd things. However, removing some dead code can break validation in 0xc, i.e., sometimes a wasm binary must contain dead code to be valid (see second example in that link).

So we must walk a fine line here, DCE but not DCE too much - that's the current status. I'm not sure yet how much of the test suite it passes, but looks like quite a lot, so maybe this is viable for now.

But to handle this more properly, my inclination is to add this to the growing list of divergences between Binaryen IR and wasm. We need to be able to do simple local transformations like that example. And it's good for us that type checking is local/encapsulated, since we intentionally have expressions stand on their own so that we can generate and optimize them in parallel - they don't need a surrounding context in order to be made sense of. But wasm needs this, so to emit valid wasm we'd need to handle this when emitting or reading the binary. But otherwise we'd just document this as the third significant divergence from wasm.

kripken commented 8 years ago

Note that the test suite coverage I mentioned was using spidermonkey. I just realized that since spidermonkey checks fewer errors than v8, the picture here might change once we can check on v8 as well, as the errors discussed above are exactly the kind on which the engines diverge.

dschuff commented 8 years ago

If we want to make binaryen's type semantics/rules different from wasm proper, then we should probably implement a strict wasm type checker and run it at whatever point we convert for output. Of course there are also still wasm-only tools in wabt; I believe wabt's type checker is strict (and if not then we should make it that way).

binji commented 8 years ago

@kripken: Maybe this is not so dire. You can always do the transformation, you just need to fix up the block signature later depending on its context. And even then, that's only true if you don't know the concrete type of X (i.e. if it is unreachable).

@dschuff: Yes, wabt is strict.

kripken commented 8 years ago

@binji: Yeah, question is when the fixup can be done. If it needs to be done after every operation that sounds bad, so one option is it's ok in binaryen IR to not care about it, and handling it when loading/saving wasm.

But then: @dschuff, I think that's a good question - do we allow loading and saving IR that isn't true wasm. This wasn't much of an issue before since we diverged by being a subset of wasm, but this divergence would make us a superset.

Not being able to save and load our IR - which a strict wasm-only checker would cause - sounds bad. So perhaps we should go ahead and do the full split from wasm, creating .byn files that differ from wasm files, etc., as suggested in #663. Maybe we've held off as long as we could on doing that?

ghost commented 8 years ago

It's an interesting discussion. Still seem to be some matters to settle. Might it help to note why a block end fall-through might be unreachable, that this can not be worked around, with an example:

(block $l0
  (block $l1
    (br_if $l1 ...)
    (br $l0)
    )
  ...
  )

Then if the block fall through needs to be type validated then it seems to create some challenges.

What is the state of the stack after the branch, in the unreachable code area? Has the branch just popped its consumed values and left the rest, or like the br_if decision not touched any?
Does the branch push an unreachable value onto the stack? If the block returns no values then does this need to be dropped?
We seem to still have the requirement that the block stack be empty after consuming the values for the fall-through, so do the remaining values need to be dropped?
If a block returned multiple values then does it need multiple unreachable values pushed on the stack.

@kripken there seem to be a lot of issues with the unreachable code and have you given any thought to the burden on binaryen of implementing the proposal in https://github.com/WebAssembly/design/issues/778

ghost commented 8 years ago

No use case for supporting the transform (X) => (block (Y) (X)) where the block label is not used comes to mind? It appears that in such cases the block is unnecessary and can be flattened into the parent block. If there are some examples then they might be interesting to consider?

Perhaps binaryen could add an internal block type for a block for which the tail is unreachable, and just ensure that it is flattened before emitting the wasm. If there is a real need for this then perhaps this needs a type in wasm too.

kripken commented 8 years ago

@JSStats: the main use case for that transform is when you need to add some code that runs first. you might not know if the parent is another block, and it might not be another block anyhow. so creating a new block is a simple and useful thing to do (and later optimizations might get rid of it, if possible).

kripken commented 8 years ago

Well, the current status is that I am not aware of asm2wasm problems due to this - we seem to work around the type system corner cases. Test suite passes and fuzzing looks ok. So I guess for now we can ignore it. But when the type system is formally written up, we should look into this more carefully, as I worry running on non-asm2wasm output may hit a problem. Leaving open for that.

WebAssembly / binaryen

0xc block signature fallout #758