dart-lang / language

Design of the Dart language
Other
2.65k stars 202 forks source link

Should we deprecate and remove support for switch case labels? #3441

Open munificent opened 11 months ago

munificent commented 11 months ago

The language currently allows labels on switch cases. Then you can use a continue followed by a label name to jump from one case to any other chosen case in the switch. This lets you implement state machines or other arbitrarily complex unstructured control flow bounded by the entire switch statement.

At one level, this feature is neat: It gives you a direct way to represent freeform control flow. Sort of like a bounded goto. If you want to encode a state machine directly in Dart without any trampolining and without wrapping the switch in a loop, you can.

It can also be useful for users writing tools that generate Dart code. If you are taking in some input that supports unstructured control flow and compiling it to Dart, then it may be much easier to write that compiler if you're able to have it generate a switch with labeled cases. This in contrast to, say WebAssembly, which features only structured control flow. In order to compile arbitrary C and C++ to WebAssembly, Alon Zakai and the Emscripten project had to devise a complex algorithm to tease some structured control flow out of a given arbitrary control flow graph.

However, that very expressiveness leads to a high implementation cost on the Dart team. I have heard anecdotally from many team members over the years that compiling switch case labels and labeled continues requires a disproportionate amount of effort for the amount of use users get out of them. Much of this is alreay sunk cost, but if we assume that Dart has a long prosperous future, we can assume that there will be future Dart compilers, analyzers, and other new or rewritten tools that will have to continue to pay this cost.

In particular, we seem to continually evolve Dart's use of control flow analysis for features like type promotion. Every new flow analysis ends up running into switch case labels and has to deal with them.

Does this feature carry its weight? There are two questions here:

  1. If we had a time machine, would we go back and never add them to the language in the first place? I'm fairly certain that the answer is "yes".
  2. Should we deal with the extra effort and user pain of removing them now? That's what this issue is asking.

To evaluate that, the main factors I can think of are:

  1. How much engineering effort would it take to deprecate and remove the feature?
  2. How much user pain and migration would be caused by the feature no longer being supported?
  3. What engineering resources would be freed up by no longer having to support this feature going forward?
  4. What user benefit is gained from not having to learn or encounter this feature?

I believe the answer to 4 is "almost none". I think almost all Dart users already live in a world where this feature doesn't exist because they are blissfully unaware of it.

For 1 and 3, we can and should ask the implementation teams.

For 2, we can get some data on this by looking at code in the wild to see how often the feature is used. I scraped 2,000 pub packages and of the 805 switch statements, I found exactly one that used labels:

      switch (widget.placement) {
        case ToolbarPopupPlacement.start:
          if (directionality == TextDirection.rtl) {
            return ToolbarPopupPlacement.end;
          }
          continue next;
        case ToolbarPopupPlacement.end:
          if (directionality == TextDirection.rtl) {
            return ToolbarPopupPlacement.start;
          }
          continue next;
        next:
        default:
          return widget.placement;
      }

You will note that this code would be exactly the same length if the continue next; lines were simply replaced with return widget.placement;. I wouldn't consider this a very compelling use.

modulovalue commented 11 months ago

Here's another example: I'm using switch case labels to implement a sort-merge join:

Iterable<T> Function<T>({
  required int Function(L, R) outer_inner_order,
  required T Function(L, R) emit,
}) already_sorted_sort_merge_join<L, R>({
  required final Iterator<L> outer,
  required final Iterator<R> inner,
}) {
  return <T>({
    required final int Function(L, R) outer_inner_order,
    required final T Function(L, R) emit,
  }) sync* {
    final iterator_outer = outer;
    final iterator_inner = inner;
    if (iterator_outer.moveNext() && iterator_inner.moveNext()) {
      final buffer = <R>[];
      statemachine:
      switch (0) {
        base_router:
        case 0:
          final outer_eq_inner = outer_inner_order(
            iterator_outer.current,
            iterator_inner.current,
          );
          if (outer_eq_inner < 0) {
            if (iterator_outer.moveNext()) {
              continue base_router;
            } else {
              break statemachine;
            }
          } else if (outer_eq_inner > 0) {
            if (iterator_inner.moveNext()) {
              continue base_router;
            } else {
              break statemachine;
            }
          } else {
            continue fill_equals;
          }
        fill_equals:
        case 1:
          final cur = iterator_inner.current;
          yield emit(
            iterator_outer.current,
            cur,
          );
          buffer.add(cur);
          if (iterator_inner.moveNext()) {
            final outer_eq_inner = outer_inner_order(
              iterator_outer.current,
              iterator_inner.current,
            );
            if (outer_eq_inner < 0) {
              if (iterator_outer.moveNext()) {
                continue route_buffer;
              } else {
                break statemachine;
              }
            } else if (outer_eq_inner > 0) {
              throw Exception(
                "Unsorted violation",
              );
            } else {
              continue fill_equals;
            }
          } else {
            if (iterator_outer.moveNext()) {
              for (final buffer_value in buffer) {
                final outer_eq_inner = outer_inner_order(
                  iterator_outer.current,
                  buffer_value,
                );
                if (outer_eq_inner < 0) {
                  throw Exception(
                    "Unsorted violation.",
                  );
                } else if (outer_eq_inner > 0) {
                  buffer.clear();
                  break statemachine;
                } else {
                  yield emit(
                    iterator_outer.current,
                    buffer_value,
                  );
                }
              }
              break statemachine;
            } else {
              break statemachine;
            }
          }
        route_buffer:
        case 2:
          final current = iterator_outer.current;
          for (final buffer_value in buffer) {
            final outer_eq_inner = outer_inner_order(
              current,
              buffer_value,
            );
            if (outer_eq_inner < 0) {
              throw Exception(
                "Unsorted violation",
              );
            } else if (outer_eq_inner > 0) {
              buffer.clear();
              continue base_router;
            } else {
              yield emit(
                current,
                buffer_value,
              );
            }
          }
          if (iterator_outer.moveNext()) {
            continue route_buffer;
          } else {
            break statemachine;
          }
      }
    }
  };
}
eernstg commented 11 months ago

One option that we could consider would be to say that if there is any continue <identifier>; in a switch statement whose target is a case then flow analysis and promotion is turned off (for that switch statement, or perhaps even for the entire function body—whatever it takes to avoid those implementation difficulties).

The point is that state machines are likely to be written by a code generator, and they don't mind writing code which is a bit less convenient and a bit more verbose. We could even emit a warning about this fact whenever such switch statements are encountered, in order to send the signal to developers that this is a specialized mechanism that they should only use if they know what they are doing, and they are willing to pay the price in terms of convenience and conciseness.

rrousselGit commented 11 months ago

I didn't know this was a thing, but I certainly wish I knew earlier. To be specific, I've on numerous occasions wanted a way to "continue" a switch.

Such that we could write:

  switch (value) {
    case int():
       print('int');
       continue;
    case String():
      print('String');
    case num():
       print('num');
  }

And for ints, this would print both int and num. But for doubles, this would only print num

Unfortunately it looks like that continue label isn't quite that. Looks like it forcibly enters a specific case, when I would've thought it'd be "resume matching against cases from here". Kind of like the opposite of break, where we don't break and keep going.

It's probably fine to remove it. But I do think there's likely a use for something similar

goderbauer commented 10 months ago

There appear to be exactly two switches in the Flutter code base that use labels:

This is not meant as a statement for or against removing it, just a data point from the wild.

water-mizuu commented 10 months ago

What if labels are only allowed in patterns that do not bind any variable?

lrhn commented 10 months ago

Patterns are not a problem, a label counts as a case which binds no variables already, so the case body won't get access to any variables.

(We could improve that if we wanted to, by letting a label count as binding the (intersection of the) variables of all the cases which has a continue to that label, as if the case had the same patterns as every one of the cases that continues to it. (That'll need a fixed point computation if the continues have cycles.)

Personally I like the idea of being able to inline a state machine, but I have never actually done so. One reason is that the states are not just states, they require a case value/pattern as well, even if I'll never want to hit them on the initial entry. And if I write a state machine, I want different states to have access to different variables, so I end up writing nested code instead.

Maybe not trying to do everything on one syntax would be better. We could have a state switch which allows jumping around freely, but with much more simplistic type flow analysis, and the strictly hierarchical switch which cannot contribute to other cases. The former could be, well, not a switch, just a delimited piece of code where you could goto arbitrary, or special, labels.

munificent commented 10 months ago

@modulovalue, I would write your example as:

enum _State {
  base_router,
  fill_equals,
  route_buffer,
  done
}

Iterable<T> Function<T>({
  required int Function(L, R) outer_inner_order,
  required T Function(L, R) emit,
}) already_sorted_sort_merge_join<L, R>({
  required final Iterator<L> outer,
  required final Iterator<R> inner,
}) {
  return <T>({
    required final int Function(L, R) outer_inner_order,
    required final T Function(L, R) emit,
  }) sync* {
    final iterator_outer = outer;
    final iterator_inner = inner;
    if (iterator_outer.moveNext() && iterator_inner.moveNext()) {
      final buffer = <R>[];

      var state = _State.base_router;
      while (state != _State.done) {
        switch (state) {
          case _State.base_router:
            final outer_eq_inner = outer_inner_order(
              iterator_outer.current,
              iterator_inner.current,
            );
            if (outer_eq_inner < 0) {
              if (!iterator_outer.moveNext()) {
                state = _State.done;
              }
            } else if (outer_eq_inner > 0) {
              if (!iterator_inner.moveNext()) {
                state = _State.done;
              }
            } else {
              state = _State.fill_equals;
            }
          case _State.fill_equals:
            final cur = iterator_inner.current;
            yield emit(
              iterator_outer.current,
              cur,
            );
            buffer.add(cur);
            if (iterator_inner.moveNext()) {
              final outer_eq_inner = outer_inner_order(
                iterator_outer.current,
                iterator_inner.current,
              );
              if (outer_eq_inner < 0) {
                if (iterator_outer.moveNext()) {
                  state = _State.route_buffer;
                } else {
                  state = _State.done;
                }
              } else if (outer_eq_inner > 0) {
                throw Exception(
                  "Unsorted violation",
                );
              }
            } else {
              if (iterator_outer.moveNext()) {
                for (final buffer_value in buffer) {
                  final outer_eq_inner = outer_inner_order(
                    iterator_outer.current,
                    buffer_value,
                  );
                  if (outer_eq_inner < 0) {
                    throw Exception(
                      "Unsorted violation.",
                    );
                  } else if (outer_eq_inner > 0) {
                    buffer.clear();
                    state = _State.done;
                    break;
                  } else {
                    yield emit(
                      iterator_outer.current,
                      buffer_value,
                    );
                  }
                }
              }

              state = _State.done;
            }
          case _State.route_buffer:
            final current = iterator_outer.current;
            for (final buffer_value in buffer) {
              final outer_eq_inner = outer_inner_order(
                current,
                buffer_value,
              );
              if (outer_eq_inner < 0) {
                throw Exception(
                  "Unsorted violation",
                );
              } else if (outer_eq_inner > 0) {
                buffer.clear();
                state = _State.base_router;
                break;
              } else {
                yield emit(
                  current,
                  buffer_value,
                );
              }
            }
            if (state != _State.base_router && !iterator_outer.moveNext()) {
              state = _State.done;
            }
        }
      }
    }
  };
}
modulovalue commented 10 months ago

That would probably work, but I have to say that I do prefer the label-based solution more. In my opinion, being able to use a statement that immediately exits the case makes it easier to reason about the behavior of the statemachine.

            final outer_eq_inner = outer_inner_order(
              iterator_outer.current,
              iterator_inner.current,
            );
            if (outer_eq_inner < 0) {
              if (!iterator_outer.moveNext()) {
                state = _State.done;
                // Are we really transitioning to the next state or is there something else happening below before we do that?
              }
            } else if (outer_eq_inner > 0) {
              if (!iterator_inner.moveNext()) {
                state = _State.done;
                // Are we really transitioning to the next state or is there something else happening below before we do that?
              }
            } else {
              state = _State.fill_equals;
              // Are we really transitioning to the next state or is there something else happening below before we do that?
            }

@munificent what do you think about the following approach for simulating the old behavior without using switch labels? Would that be supported?

void main() {
  next: for (int state = 0;;) {
    switch (state) {
      case 0:
        print("Do state 0 things");
        state = 1;
        continue next; // Go to state 1.
      case 1:
        print("Do state 1 things");
        state = 2;
        continue next; // Go to state 2.
      case 2:
        print("Do state 2 things");
        state = -1;
        break next; // Stop the statemachine.
    }
  }
  print("The statemachine terminated.");
}
munificent commented 6 months ago

Yeah, that works fine too.

modulovalue commented 2 days ago

Note that Zig has recently added support for a feature that is very similar to switch case labels in Dart.

See:

Their new feature goes a little further than what we have in Dart. Their continue label supports passing in an argument which is used to jump to the matching case, instead of them only being able to go to a labelled case, like in Dart.

The release notes discuss the difference between the alternative that I suggested here and the new feature. They seem to suggest that a dedicated language feature makes it easier to generate optimized code that helps the CPU in predicting branches.

Personally, I would welcome any improvements to this feature in Dart, as I'm using it to improve performance without having to sacrifice readability.

Quote:

Code Generation Properties

This language construct is designed to generate code which aids the CPU in predicting branches between cases of the switch, allowing for increased performance in incredibly hot loops, particularly those dispatching instructions, evaluating FSAs, or performing similar case-based evaluations. To achieve this, the generated code may be different to what one would intuitively expect.

If the operand to continue is comptime-known, then it can be translated to an unconditional branch to the relevant case. Such a branch is perfectly predicted, and hence typically very fast to execute.

If the operand is runtime-known, then each continue can become a separate conditional branch (ideally via a shared jump table) back to the same set of potential branch targets. The advantage of this pattern is that it aids the CPU's branch predictor by providing different branch instructions which can be associated with distinct prediction data. For instance, when evaluating an FSA, if case a is very likely to be followed by case b, while case c is very likely to be followed by case d, then the branch predictor can use the direct jumps between switch cases to predict the control flow more accurately, whereas a loop-based lowering causes the state dispatches to be "collapsed" into a single indirect branch or similar, hindering branch prediction.

This lowering can inflate code size compared to a simple "switch in a loop" lowering, and any Zig implementation is, of course, free to lower this syntax however it wishes provided the language semantics are obeyed. However, the official ZSF compiler implementation will attempt to match the lowering described above, particularly in the ReleaseFast build mode.