Relax the spec rule about `initializerExpression` not deriving a function literal?

dart-lang / language

Design of the Dart language

Other

2.68k stars 205 forks source link

Relax the spec rule about `initializerExpression` not deriving a function literal? #4163

Open eernstg opened 1 week ago

eernstg commented 1 week ago

The discussion in dart-lang/sdk#54284 showed that the following declarations are accepted by the analyzer as well as the common front end (808fa4ca8b5fbc3d79ade02f607b8533c9714029):

class Foo1 {
  final Function a;
  final int b;
  Foo1()
      : a = <T>() {},
        b = 0;
}

class Foo2 {
  final Function a;
  Foo2() : a = <T>() => 0;
}

void main() {}

However, the language specification has a rule which says that <initializerExpression> does not derive a <functionExpression>. This should cause the above constructor declarations to be syntax errors.

It could be argued, though, that this is benign: The rule against deriving an initializer expression into a function expression is motivated by a readability issue. In particular, it is in some situations impossible to decide whether a given function body is the body of a function literal which is used to initialize an instance variable, or it's the body of the enclosing constructor declaration, until the very end of that body has been reached: If there's a semicolon then it was the function literal body, if there is no semicolon then it was the constructor body. That's a potentially large amount of code to scan, just to see if there is a semicolon (and nobody will do it unless they are syntax nerds ;-).

However, this ambiguity only exists because the formal parameter list may look like an expression (() could be a formal parameter list or the empty record, (x) could be a parameter list or a parenthesized expression, and (x, y) could be a parameter list or a record literal). This is not true when the function literal is generic: <T>(...) is not an expression (for now, at least ;-).

So we could update the spec to say that <initializerExpression> cannot derive a non-generic function expression.

This spec change would not give rise to any implementation effort because the tools already behave in this way.

@dart-lang/language-team, WDYT?

lrhn commented 1 week ago

cannot derive a non-generic function expression.

That sounds like an even harder exception to remember than "initializer-list expressions can not be function literals".

stereotype441 commented 1 week ago

cannot derive a non-generic function expression.

That sounds like an even harder exception to remember than "initializer-list expressions can not be function literals".

Agreed. I'm guessing this isn't a common coding pattern, so assuming that's true, I would rather make a breaking change to the language that causes the above examples to become errors. I'd be glad to drive this through the breaking change process, and update the implementations, if there's general agreement that it's a good idea.

(If it turns out that I'm wrong, and it is a common coding pattern, then we will discover that when we try to push out the change, and we can re-evaluate.)

eernstg commented 1 week ago

OK, I'm fine with any decision. However, let me reiterate a couple of points about this topic, just to make sure we have them on the table.

It could be argued that a function literal with a big body in an initializer list is really confusing even in the case where an initial <T> will technically disambiguate the situation:

class A {
  var noise1, noise2;
  Function f;

  A(): noise1 = "Some long thing", noise2 = [1, 2, 42], f = <X>(X x) {
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
  };
}

However, we will still be able to use this variant:

class A {
  var noise1, noise2;
  Function f;

  A(): noise1 = "Some long thing", noise2 = [1, 2, 42], f = (<X>(X x) {
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
    // and then a lot of code showing that we're working really hard.
  });
}

So how much of a solution do we actually have for this problem? Would it be sufficient to rely on the good taste of developers and just lift the restriction? (the good taste is needed for every line of code out there, anyway ;-).

stereotype441 commented 1 week ago

@eernstg For me personally it's more than just a readability issue. It's also a parser complexity issue.

When the parser is looking for an expression, and it sees an (, it has to decide, before it can consume any more tokens, whether to parse the text inside the parens as function parameters or expressions. Today, it can do that with a pretty simple heuristic:

If mayParseFunctionExpressions is false (meaning, roughly, that the parser is inside a constructor initializer list), then function literals aren't allowed here, so the text inside the parens should be parsed as expressions.
Otherwise, if the token following the matching ) is =>, {, or a keyword/identifier*, then the text inside the parens should be parsed as function parameters.
- *The "keyword/identifier" case handles the tokens async and sync, as well as providing good error recovery in the case where the user has started typing async or sync but hasn't finished yet, e.g. () asy {}.
Otherwise, the text inside the parens should be parsed as expressions.

If we allow function expressions inside initializer lists, then this heuristic would have to get more complicated. We already have a lot of parser complexity in this area, so I think there's a relatively high risk of bugs. I would certainly want to think pretty carefully when designing the new heuristic, and I would want to do a lot of testing to make sure it gets all the corner cases right. It would be a nontrivial amount of effort. Personally, I don't think that effort would really buy us much in terms of user satisfaction.

I suppose you could argue that the alternative, where we prohibit generic function literals when mayParseFunctionExpressions is false, doesn't buy us much in terms of user satisfaction either. But it would be dead simple to implement and test, and it would still have the advantage of bringing the implementations more in line with the spec (which is a good thing).

eernstg commented 6 days ago

OK, so we have the initial suggestion:

So we could update the spec to say that <initializerExpression> cannot derive a non-generic function expression. This spec change would not give rise to any implementation effort because the tools already behave in this way.

And the new one you mentioned, @stereotype441:

Adjust the implementation such that the error is reported also for generic function literals; do not change the spec.

I'd be happy about the either of those, with a slight preference for the second.

leafpetersen commented 5 days ago

Fine with either, slight preference for the latter (assuming no meaningful breakage).

munificent commented 23 hours ago

I'm guessing this isn't a common coding pattern, so assuming that's true,

I checked. From a large corpus of pub packages and itsallwidgets.com (48,509,868 lines in 229,473 files):

Of all constructor initializers, none of their right-hand sides were function expressions:

-- Initializer (162824 total) --
 162824 (100.000%): Other  =====================================================

Looking at function expressions more generally:

-- Function expression (798844 total) --
 798609 ( 99.971%): Non-generic  ===============================================
    235 (  0.029%): Generic      =

So generic lambdas are exceedingly rare. Almost all of them are in a small number of packages:

-- Package (235 total) --
     63 ( 26.809%): dart_mappable-4.3.0          =========
     31 ( 13.191%): riverpod-2.6.0               =====
     23 (  9.787%): reflection_factory-2.2.5     ====
     23 (  9.787%): reflection_factory-2.4.2     ====
     14 (  5.957%): zycloud_widget-1.1.2         ==
     13 (  5.532%): shelf_easy-3.2.2             ==
      8 (  3.404%): _fe_analyzer_shared-76.0.0   ==
      6 (  2.553%): bones_api-1.5.10             =
      6 (  2.553%): bones_api-1.7.9              =
      6 (  2.553%): bones_api-1.7.24             =
...

If we allow function expressions inside initializer lists, then this heuristic would have to get more complicated. We already have a lot of parser complexity in this area, so I think there's a relatively high risk of bugs. I would certainly want to think pretty carefully when designing the new heuristic, and I would want to do a lot of testing to make sure it gets all the corner cases right. It would be a nontrivial amount of effort. Personally, I don't think that effort would really buy us much in terms of user satisfaction.

I recall the old language team stressing about exactly this parser complexity when they first added function expressions to the language and discovered the annoying corner case around constructor initializers. So, yeah, I'd definitely want to err on the side of keeping things grammatically simple.

I'd vote just not allowing function expressions in constructor initializers, generic or not.

eernstg commented 12 hours ago

I'd vote just not allowing function expressions in constructor initializers, generic or not.

I think this is the outcome.