More rationale on the choice of pre-order and control structures in the binary format

ghost commented 9 years ago

This section: https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#serialized-ast could use some more rationale as to why a pre-order encoding is being used. It also says "Allows context-dependent index spaces (described above)", which is not obvious what that refers to, nor do I see how that would depend on it being pre-order.

In fact, I can see advantages to post-order representation:

Allows things like validation or building up a tree by using nothing but a temporary node stack, and an otherwise very linear parser. Pre-order can be equally efficient if you want, but it fits recursive parsing more naturally.
Closer fit to backends that use stack-machines in some way.

Related to that, some more rationale on requiring backends to use a relooper-like algorithm to produce a limited set of control structures. I understand it results in the absolutely smallest binary, but the algorithm is not entirely non-trivial, and it feels.. less elegant to require this when most backends will have a branch/graph based representation they're working from, and V8 etc presumably also needing to convert this back to a more general representation.

What alternatives have been considered? If the top level of a block is a list of "statements", could a (conditional) branch node that refers to a statement index work also, at a small cost in binary size, but be more generic in what kind of structures it represents, being easier to generate and decode etc.?

jfbastien commented 9 years ago

Side note: I believe @LouisLaf had concerns in https://github.com/WebAssembly/design/issues/239#issuecomment-118993642, the surrounding discussion provides good context.

Agreed that we should clarify.

titzer commented 9 years ago

On Thu, Aug 13, 2015 at 3:10 AM, Wouter van Oortmerssen < notifications@github.com> wrote:

This section:

https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#serialized-ast could use some more rationale as to why a pre-order encoding is being used. It also says "Allows context-dependent index spaces (described above)", which is not obvious what that refers to, nor do I see how that would depend on it being pre-order.

In fact, I can see advantages to post-order representation:

Allows things like validation or building up a tree by using nothing but a temporary node stack, and an otherwise very linear parser. Pre-order can be equally efficient if you want, but it fits recursive parsing more naturally.

Closer fit to backends that use stack-machines in some way.

This only makes sense if your parser for the bytecode only does interesting things at post-order time, i.e. when a syntax tree node is completed by having its sub-trees completed. If your parser wants to do interesting things at pre-order or in-order points, e.g. splitting an SSA-renaming environment, or generating machine code, then a postorder encoding will always imply a second pass. Concretely, suppose we were writing a baseline JIT and we had:

if (x) TS else FS

If we want to write a one-pass JIT, then we want to:

0.) 1.) gen code for x 2.) gen a branch_if_false to a label for FS (unresolved) 3.) gen code for TS and then a jump to the end of the if (unresolved) 4.) bind the label for FS, patching the branch generated in step 2 5.) gen code for FS and fallthru 6.) patch the label generated in step 3

step 0.) is essentially the pre-order visitation point. Steps 2) and 4) are in-order, and step 6) is the post-order point. If we have a post-order encoding of the tree, there is no way to make a one-pass baseline JIT work, since it would have to first scan pass the bytecode for x, TS, and FS before reaching the if, keeping a stack of them (or recursing). But only when you reach the if does it know in which order to generate code and that there should be branches in between. But worse, it's still too early, since this if could clearly be nested in some other structure which is on the stack. You essentially have to wait until you construct a full AST and then visit it again.

Related to that, some more rationale on requiring backends to use a relooper-like algorithm to produce a limited set of control structures. I understand it results in the absolutely smallest binary, but the algorithm is not entirely non-trivial, and it feels.. less elegant to require this when most backends will have a branch/graph based representation they're working from, and V8 etc presumably also needing to convert this back to a more general representation.

What alternatives have been considered? If the top level of a block is a list of "statements", could a (conditional) branch node that refers to a statement index work also, at a small cost in binary size, but be more generic in what kind of structures it represents, being easier to generate and decode etc.?

There was actually a lot of discussion about this and the benefits of structured control flow seem to be highly worth preserving. It's not primarily about encoding size. The encoding size might be more rightly considered a happy accident.

One important point is that most engines (V8 included) don't properly support irreducible control flow graphs, so unrestricted gotos would have to be either policed or internally processed before making it to the optimizing compiler. Or, optimizing compilers in the major browser engines would have to be upgraded to handle irreducible control flow specifically for wasm. Side note: JVMs generally don't optimize bytecode methods with irreducible graphs either; they are stuck in the interpreter.

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/295.

ghost commented 9 years ago

@titzer: thanks for the detailed reply. I guess it comes down to what use cases are deemed most important. A statement of the form "while pre order and post order are similar from a binary encoding perspective, pre-order works best for A B and C, while post-order would have advantages for X Y and Z. because of this, we chose pre-order." would be good.

I see your point about the control structures. Slightly clumsier to generate, but easier to work with for the wasm implementation, I agree that that's a win. I'm curious what effect this will have on future programming language support. Also, does it mean that it is possible to write really convoluted gotos in C++ that cannot be translated? How will this "error" percolate up to the programmer?

AndrewScheidecker commented 9 years ago

@gwvo "Allows context-dependent index spaces (described above)" I think what that is saying is that with pre-order serialization, the encoding of opcodes etc can be dependent on the context. For example, the polyfill prototype partitions opcodes by result type, and requires knowledge of the result type to decode a subtree. That knowledge is nearly always implicit, so it rarely needs to store it.

It's clever, but I can't comment on whether it saves a substantial amount of space.

"Also, does it mean that it is possible to write really convoluted gotos in C++ that cannot be translated?"

Emscripten apparently translates arbitrary control flow into a weird loop+switch construct. It's awkward, but it shows that you can translate arbitrary control flow to JavaScript. (see http://davideglintine-new.googlecode.com/hg/docs/paper.pdf)

var label = 0;
while(1) switch(label) {
    case 0: label = 1; break;
    case 1: label = 2; break;
    case 2: return;
}

I think WebAssembly can expose the same functionality in a way that can be directly translated to JavaScript, but can also just as easily generate more efficient code for VMs that support arbitrary control flow. You just need a node that means "var label = 0; while(1) switch(label) {}", and a node that means "label = literal int; break;"

lukewagner commented 9 years ago

FWIW, while I initially proposed (and implemented, in polyfill-prototype-1), context-dependent index spaces as a way to reduce contention for single-byte opcode encodings when there are many ops, more thinking about what was achievable through macros and module-local index spaces caused me to change my mind and remove the section describing it (but not the reference, as you've pointed out, which I can nix now) from BinaryEncoding.md.

ghost commented 9 years ago

@AndrewScheidecker : thanks for the explanation.. though I personally think remapping opcode assignments is a bit overly flexible, I can see the use of it, and I'm fine with it. It could be clearer though.

As for the loop + switch construct, I presume that gets used only in "desperate" cases, because clearly that takes up more space than the equivalent gotos.

AndrewScheidecker commented 9 years ago

The loop+switch construct would indeed be larger than the equivalent gotos, but I believe it's rarely needed. I just looked at a small Emscripten compiled program I was using for testing and 9/679 "do"s used this do switch pattern.

The benefit that a such specialized construct provides is to enable a purely local transform of that construct into your choice of JavaScript, a CFG, or bytecode with gotos. IMO it's worth frontloading such analysis at compile-time instead of making the runtime do it, even if it makes the binaries a little larger.

On Wed, Aug 19, 2015 at 5:51 PM, Wouter van Oortmerssen < notifications@github.com> wrote:

@AndrewScheidecker https://github.com/AndrewScheidecker : thanks for the explanation.. though I personally think remapping opcode assignments is a bit overly flexible, I can see the use of it, and I'm fine with it. It could be clearer though.

As for the loop + switch construct, I presume that gets used only in "desperate" cases, because clearly that takes up more space than the equivalent gotos.

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/295#issuecomment-132799405.

sunfishcode commented 9 years ago

Also note that this is an area where WebAssembly is likely to grow beyond the initial features set in the MVP, so loop+switch is not necessarily the end of the story. FutureFeatures has a brief section mentioning some options.

ghost commented 8 years ago

@lukewagner Noting your comment based on experience with polyfill-prototype-1 and support for macros, has it been decided that there will no be 'context-dependent index spaces'? I would support such a position as I think it will make it much more difficult to make tools that work with the binary code if the index are context-dependent.

I think that the use of what I would call 'vectors' in polyfill-prototype-1, where the number of elements in vectors (or lists) is emitted before each of these elements are emitted, would also complicate tools that manipulate the binary code. For example, adding or removing elements could not be streamed, rather the entire vector would need to be re-generated and the new length determined before any of it could be emitted.

I see that the binary design precludes tagging, but if this did not increase the size too much then it might help solve a range of challenges. There appear to be the following objects in polyfill-prototype-1: uint32, int32, null terminated string, f32, f64, nodes. If someone could come up with a relatively efficient way to tag these then it might still deserve consideration.

A lot of effort seems to have been put into the encoding efficiency in polyfill-prototype-1. In a number of places I was wondering if it was really worth the extra complexity. It might be cleaner if some of the encoding compression were layered and shared. If it helped then using variable bit length fields might be less complex if layered and might do better.

I see that a polyfill is inbound on emscripten, but it does not appear to reflect the current binary design, and are there any plans to re-do this, or is the design being worked out in the spec?

Will the polyfill support all existing asm.js code, even style not supported in wasm such as globals, so that it can be used on existing asm.js code, or would this be better done with a separate polyfill?

lukewagner commented 8 years ago

Noting your comment based on experience with polyfill-prototype-1 and support for macros, has it been decided that there will no be 'context-dependent index spaces'?

I think there are still split opinions on this, but I am currently in favor of removing. In #287 and IRL we've agreed to postpone these kinds of purely binary encoding choices until after we have "finished" the definition of the AST and semantics and are able to generate realistic-sized ASTs (using the S-Expr temporary text format) which we can then use to do apples-to-apples A/B tests for these types of questions.

A lot of effort seems to have been put into the encoding efficiency in polyfill-prototype-1. In a number of places I was wondering if it was really worth the extra complexity. It might be cleaner if some of the encoding compression were layered and shared.

Agreed that layering can reduce the complexity in the base layer; this is reflected in the current binary encoding design (where only the "raw" layer is implemented by a browser). As noted in the README.md and name, the polyfill-prototype was only intended as a proof of concept to demonstrate the point that: (1) a polyfill was possible/effective, (2) a binary format could show significant size wins over text, even after gzip/lzma (both points had doubt before). I should add that the design started with a pure/simple preorder encoding and each subsequent complication was only added after showing significant (>1%) wins after gzip. But the prototype doesn't consider the use of a user-land "layer 1" which allows similar wins without the native decoder complexity. (Indeed I think a polyfill could actually be folded into the layer 1 decoding, blurring the line between the two and allowing us much greater freedom to decouple asm.js-polyfill concerns from wasm.)

I see that a polyfill is inbound on emscripten, but it does not appear to reflect the current binary design, and are there any plans to re-do this, or is the design being worked out in the spec?

That feature will not be advertised as "WebAssembly", but just a "size optimization" flag that addresses real user requests we are getting today. Likely we'd want to also throw in user-mode lzham compression on top (for another ~25% win) which also avoids the need to configure the server for HTTP Content-Encoding.

flagxor commented 8 years ago

As we've moved to a post-order format and other finegrain items subsume most of this, closing.

WebAssembly / design

More rationale on the choice of pre-order and control structures in the binary format #295