WebAssembly / design

WebAssembly Design Documents
http://webassembly.org
Apache License 2.0
11.41k stars 695 forks source link

Eliminate `continue` statement #310

Closed rossberg closed 9 years ago

rossberg commented 9 years ago

In a language with break-from-block, the continue statement is redundant. Any loop of the form

l: while (C) { ... continue l; ... }

can equivalently be rewritten to

l: while (C) l2: { ... break l2; ... }

While a separate continue statement is useful in user-facing syntax, I see no benefit in having it in Wasm. In particular, since labels are implicit, so there isn't even any encoding overhead in the second form.

I propose removing continue.

kg commented 9 years ago

Labels are implicit? That's interesting, when did we decide on that? If that's the case, I agree that continue x is redundant and unnecessary.

rossberg commented 9 years ago

At least that was my understanding wrt to the binary encoding. It's what the V8 prototype does -- break and continue just take a relative depth parameter.

sunfishcode commented 9 years ago

This does look like it's an improvement over the current design. It slightly obscures backedges in the case of forever loops, but makes do_while control flow simpler, which is nice.

That said, I also have an open proposal for a much broader redesign of the control constructs in #299.

titzer commented 9 years ago

I'm not sure we're gaining anything by reducing the number of control flow constructs here. On the one hand we're adding more forms of comparisons, more conversions, and more integer doodads, and then we're taking away common control flow patterns. This is not serving producers well, and it is not really making implementations any simpler, since the surface area of all those other operations completely dwarfs implementing control flow.

No one even has a completely functional pipeline and we're changing fundamentals with zero data based on taste.

On Mon, Aug 24, 2015 at 8:23 PM, Dan Gohman notifications@github.com wrote:

This does look like it's an improvement over the current design. It slightly obscures backedges in the case of forever loops, but makes do_while control flow simpler, which is nice.

That said, I also have an open proposal for a much broader redesign of the control constructs in #299 https://github.com/WebAssembly/design/pull/299.

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/310#issuecomment-134325353.

kg commented 9 years ago

Would it help if I put a wasm backend in my personal compiler so we have a test corpus to play with until LLVM is ready?

titzer commented 9 years ago

On Mon, Aug 24, 2015 at 9:05 PM, Katelyn Gadd notifications@github.com wrote:

Would it help if I put a wasm backend in my personal compiler so we have a test corpus to play with until LLVM is ready?

Yeah, let's see what comes out :-)

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/310#issuecomment-134341914.

rossberg commented 9 years ago

@titzer, this is not taking away any pattern. A producer just literally replaces continue n with break n-1, that's all.

titzer commented 9 years ago

No, they need to have another block inside the loop order to do so.

On Mon, Aug 24, 2015 at 9:40 PM, rossberg-chromium <notifications@github.com

wrote:

@titzer https://github.com/titzer, this is not taking away any pattern. A producer just literally replaces continue n with break n-1, that's all.

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/310#issuecomment-134351002.

rossberg commented 9 years ago

@titzer, in asm.js break can exit any statement, not just blocks or loops. But even if Wasm is supposed to be more restricted (should it?), isn't that a triviality?

titzer commented 9 years ago

In v8-native, a loop is just like a block in that it contains multiple statements/expressions. IIUC you are proposing that loop has only a single statement/expression and that to encode continue you make a loop with a block as its subexpression and then break the subblock, correct?

If so, I think that's a lot of aggravation for no gain anywhere.

On Mon, Aug 24, 2015 at 10:14 PM, rossberg-chromium < notifications@github.com> wrote:

@titzer https://github.com/titzer, in asm.js break can exit any statement, not just blocks or loops. But even if Wasm is supposed to be more restricted (should it?), isn't that a triviality?

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/310#issuecomment-134363514.

rossberg commented 9 years ago

I see. I hadn't noticed that about v8-native. The AST semantics doesn't say anything to the contrary, so I have been assuming the obvious C statement structure so far. Has this actually been discussed?

AndrewScheidecker commented 9 years ago

I think the primary benefit to eliminating the continue operation is that it simplifies the AST context; e.g. the context to mimic the JavaScript break and continue semantics as the binary encoding from the polyfill prototype requires is annoying. Another way to simplify the context (and eliminate continue) without requiring extra label nodes is to give loop nodes two branch targets: one for break, and one for continue.

e.g. this WebAssembly: (loop $break $continue ... (branch $continue) ... (branch $break) ...) could translate to this JavaScript: a: while(true) { ... continue a; ... break a; ... } and can also be easily translated to lower level branching operations.

Incidentally, I also think the text format should eventually require referencing labels, locals, and globals by name; that would mean only consumers of the binary format need to understand indexing. With rules like loops introducing two labels, or separate index spaces for locals by type, that's potentially a lot of complexity that could be isolated to the binary format. It makes sense to keep the indexing in the text format for prototyping, but once the reference interpreter also implements a binary format, it would be nice to get rid of it.

kg commented 9 years ago

Another way to simplify the context (and eliminate continue) without requiring extra label nodes is to give loop nodes two branch targets: one for break, and one for continue.

This would address my concerns. I like it.

Incidentally, I also think the text format should eventually require referencing labels, locals, and globals by name; that would mean only consumers of the binary format need to understand indexing.

That would be great, since I basically don't understand how the numeric indexes work right now despite trying to :-)

titzer commented 9 years ago

On Mon, Aug 31, 2015 at 6:11 PM, Andrew Scheidecker < notifications@github.com> wrote:

I think the primary benefit to eliminating the continue operation is that it simplifies the AST context; e.g. the context to mimic the JavaScript break and continue semantics as the binary encoding from the polyfill prototype requires is annoying. Another way to simplify the context (and eliminate continue) without requiring extra label nodes is to give loop nodes two branch targets: one for break, and one for continue.

e.g. this WebAssembly: (loop $break $continue ... (branch $continue) ... (branch $break) ...) could translate to this JavaScript: a: while(true) { ... continue a; ... break a; ... } and can also be easily translated to lower level branching operations.

Incidentally, I also think the text format should eventually require referencing labels, locals, and globals by name; that would mean only consumers of the binary format need to understand indexing. With rules like loops introducing two labels, or separate index spaces for locals by type, that's potentially a lot of complexity that could be isolated to the binary format. It makes sense to keep the indexing in the text format for prototyping, but once the reference interpreter also implements a binary format, it would be nice to get rid of it.

I'm all for using labels in the text format as opposed to indices for break/continue, but I still think the continue statement is useful in and of itself. FWIW v8-native basically does this. It maintains a stack of ifs/loops which it uses to find the target for either a break or a continue.

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/310#issuecomment-136417393.

AndrewScheidecker commented 9 years ago

FWIW v8-native basically does this. It maintains a stack of ifs/loops which it uses to find the target for either a break or a continue.

The main change to v8-native to implement this would be that each label on the stack only has a single target instead of a break target and a possibly null continue target. It also eliminates an edge case to verify that the stack index parameter of a continue operation has a non-null continue target.

qwertie commented 9 years ago

(loop $break $continue ... (branch $continue) ... (branch $break) ...) could translate to this JavaScript: a: while(true) { ... continue a; ... break a; ... }

sgtm.

Incidentally, I also think the text format should eventually require referencing labels, locals, and globals by name...

I think it would be unfortunate if the text format were to diverge too much from the binary format - people learning about webassembly will naturally assume that the text format is conceptually the same as the binary one. If the text format supports both names and indices, documentation will naturally introduce both and explain how one is syntactic sugar for the other. Such information is likely to be left out, or not explained as well, if the text format is not allowed to reflect the binary format.

titzer commented 9 years ago

On Mon, Aug 31, 2015 at 11:29 PM, David Piepgrass notifications@github.com wrote:

(loop $break $continue ... (branch $continue) ... (branch $break) ...) could translate to this JavaScript: a: while(true) { ... continue a; ... break a; ... }

sgtm.

Incidentally, I also think the text format should eventually require referencing labels, locals, and globals by name...

I think it would be unfortunate if the text format were to diverge too much from the binary format - people learning about webassembly will naturally assume that the text format is conceptually the same as the binary one. If the text format supports both names and indices, documentation will naturally introduce both and explain how one is syntactic sugar for the other. Such information is likely to be left out, or not explained as well, if the text format is not allowed to reflect the binary format.

I agree, and I also think that webassembly shouldn't be unnecesarily cumbersome because we're trying to be too minimal with control structures. I think continue is very, very common and we wouldn't save much, if anything, by leaving it out.

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/310#issuecomment-136506906.

AndrewScheidecker commented 9 years ago

I don't feel too strongly one way or the other about continue. Just wanted to point out you can do it without requiring extra label nodes.

I think it would be unfortunate if the text format were to diverge too much from the binary format - people learning about webassembly will naturally assume that the text format is conceptually the same as the binary one.

I agree in general, but the text format already lets you name things where I assume the binary format will be all indices: non-exported functions, globals, locals, parameters, labels, etc. IMO that's justifiable because it makes it much easier for a human to read (the whole point of the text format) without changing the semantics. I don't see why anybody other than people writing an encoder or decoder need to even know that the binary format uses indices. How many people with a functional knowledge of x86 assembly know how it's encoded?

qwertie commented 9 years ago

I agree in general, but the text format already lets you name things where I assume the binary format will be all indices: non-exported functions, globals, locals, parameters, labels, etc.

You have a point there... but just to be clear, I didn't immediately find the issue id, but I remember we were talking about (and I advocate) having a place for names (of parameters, locals, etc.) to be stored in the binary format, so that text and binary will round-trip and other reasons (e.g. having a good debugging experience in the absence of source code.) The names could be stripped to save space, which makes me think the text format needs a way to represent indices in that scenario.

titzer commented 9 years ago

On Tue, Sep 1, 2015 at 5:45 PM, David Piepgrass notifications@github.com wrote:

I agree in general, but the text format already lets you name things where I assume the binary format will be all indices: non-exported functions, globals, locals, parameters, labels, etc.

You have a point there... but just to be clear, I didn't immediately find the issue id, but I remember we were talking about (and I advocate) having a place for names (of parameters, locals, etc.) to be stored in the binary format, so that text and binary will round-trip and other reasons (e.g. having a good debugging experience in the absence of source code.) The names could be stripped to save space, which makes me think the text format needs a way to represent indices in that scenario.

Names are good, so I support having space somewhere for storing them. But I would advocate that we don't specific a standard format for them, and leave the interpretation of that section up to a higher level. Other than dynamic linking, names shouldn't have a semantic impact on the program. In particular, names shouldn't be required for the engine to produce stack traces, IMO.

Engines should produce low-level stack traces that should be interpreted by a higher level to produce a source-level trace. That's a screwup in the JVM; exceptions have line numbers, not bytecode offsets, so you need to have a standard bytecode offset -> line number table.

— Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/issues/310#issuecomment-136765830.

kg commented 9 years ago

The names could be stripped to save space, which makes me think the text format needs a way to represent indices in that scenario.

The consensus there was that stripped names would be bare symbols, i.e. @7 instead of just raw indices. In theory a disassembler could (should) assign them better names, as most decompilers do, i.e. @loop3, @func9, @local7 or even @i32_7. The only important thing there is round-tripping the index relationships.

And as Ben says, the names are a human concern, not something that affects the engine. They just live in a table and are used for human-facing stuff like a debugger or a disassembler.

titzer commented 9 years ago

Polling this issue to see if there is a consensus: text format aside, continue stays or goes?

rossberg commented 9 years ago

If I understand the recent resolution of the control structure discussion correctly (not sure about that), then continue is merely a synonym for (certain uses of) break.

sunfishcode commented 9 years ago

Yes, I think we all agree now that we don't want continue as a distinct operator.

titzer commented 9 years ago

I think so. There is a distinction at the v8-native binary level since break targets the end label of a loop/block and continue targets the begin label. So, encoding issue?

rossberg commented 9 years ago

Yes, I'd argue that's an encoding issue.

titzer commented 9 years ago

I see how we can do this with a v8-native change; just push two labels onto the stack for loop and then break and br_if can pick which one based on the nesting depth.

Closing. Remove continue.