dart-lang / language

Design of the Dart language
Other
2.66k stars 205 forks source link

[Field promotion] Final expressions #3327

Open leafpetersen opened 1 year ago

leafpetersen commented 1 year ago

Field reads cannot in general be promoted based on null checks or instance checks because stability of the result cannot be guaranteed. Older overview summary here. Various proposals have been floated to resolve this based on either providing a way for APIs to opt into promotion (e.g. stable getters) or by providing better ways to bind a local variable (search for the label field-promotion for some discussion issues).

This issue sketches out an approach based on introducing an implicit hidden final variable to cache the value of a field read within the scope of a promotion. We allow a promotable expression (for the purposes of this discussion, we can consider the set of syntactic expressions which are currently candidates for private field promotion) to be prefixed by final to indicate that within the "scope" (loosely speaking) of the expression, subsequent references to the expression should re-use the value of the initial evaluation of the expression, rather than re-computing the value. This guarantees stability of the expression, making promotion sound. As with private field promotion, assignments to the root of the path would need to be accounted for appropriately. Examples:

class A {
  int? x;
  A(this.x);
  void test(A other) {
     if (final other.x != null) {
        other.x.isEven; // Valid, `other.x` is promoted.
        if (final x != null) {
           print(x + other.x); // Valid, this.x and other.x are both promoted
       }
     }
    if (x != null) {
        x.isEven; // Valid implicit `this.x` is promoted
    }
  }
}

class Bad {
  int _x = 0;
  int get x => _x++;
  void test(Bad other) {
      other._x = 0;
      int cache = other.x; // assigns 0 to cache
      if (final other.x == 1) {
         print(cache); // prints 0
         print(other.x); // prints 1
         other._x = 99; // 
         print(other.x) // prints 1
     } 
    print(other.x); // We are out of the "scope" of the final expression prints 99
  }

An advantage of this approach is that it applies to any field or getter, without any requirements for the API designed to make a choice. It does so without requiring the user to invent, bind, and use a new local variable.

A disadvantage of this approach is that the "scope" of the hidden variable may not be obvious. This doesn't matter for "well-behaved" getters, but as the Bad example shows, if a getter actually returns different values on different reads, then which invocations are cached becomes relevant, and is not clearly visible in the syntax.

One approach to mitigating this would be to only allow final expressions to be "bound" in a limited set of locations in which scope is clear (similar to the way we handle if case expressions now. So then we could say that if (final e <predicate>) <block> caches the value of e within <block>, where <predicate> is one of a limited set of binary operations (e.g. != null and is T ). This makes it clear what the scope of the "finalization" is. Example:

void test(Bad bad) {
  if (final bad.x != null) {
     bad.x.isEven; // promotion is valid, value is the same as the last read
  } else {
     throw "Unreachable";
 }
 // bad.x.isEven; // Static error, promotion is disabled
 print(bad.x); // New value is read, rather than re-using the value cached in the scope of the `if`.
}

There is still some possibility for confusion around assignments to the root of the path. For example, the following code might be confusing:

void test(Bad bad) {
  if (final bad.x != null) {
     print(bad.x);
     bad = Bad();
     print(bad.x);
  } 
}

It seems wrong to allow the second call to bad.x in the body of the if to use the cached value. We could say that this code is valid, but that the "finality" ends at the point of the assignment, but perhaps it is better to say that within the scope of a final expression, assignments to the root variable of the path are statically disallowed, and hence the code above becomes an error.

With the adjustments above, we end up with something that essentially behaves as if if (final e != null) <block> were shorthand for if (e case final freshVar?) <block2> where block2 is block with all uses of e replaced with freshVar.

cc @dart-lang/language-team

Reprevise commented 1 year ago

I'm personally not a fan of this syntax. I find it pretty hard to read and unclear as to what exactly is going on here. It seems like there's a hidden variable being declared (though that's not really what's happening). That's why I like the if-case syntax because you have to explicitly declare a new variable inside the if parenthesis and it's more clear that the getter's value is being "held" because it's being stored in r below.

class Bad {
  bool? get random => math.Random().nextBool() ? true : null;

  void test(Bad other) {
    if (other.random case final bool r) {
      print(r); // always prints "true"
    }

    print("${other.random}"); // could print null or "true"
  }
}
natebosch commented 1 year ago

I think it would be confusing to have expressions which look like property reads treated as a local variable. I think the if (other.x case final otherX?) version will be easier to read than making other.x sometimes be a variable name.

lrhn commented 1 year ago

My summary would be that:

... EDIT: that I completely misread the feature, so everything below not really useful. Snip.

Reprevise commented 1 year ago

One other minor point against this syntax is that it requires you to write out the whole expression again like so:

if (final x.foo.bar.baz is String) {
  print(x.foo.bar.baz.codeUnitAt(0));
}

// When compared to
if (x.foo.bar.baz case final String baz) {
  print(baz.codeUnitAt(0));
}
leafpetersen commented 1 year ago

@natebosch

I think it would be confusing to have expressions which look like property reads treated as a local variable. I think the if (other.x case final otherX?) version will be easier to read than making other.x sometimes be a variable name.

Well, yes, but... we have that syntax and we still have people asking for field promotion. So manifestly it is not so.

@Reprevise

One other minor point against this syntax is that it requires you to write out the whole expression again like so:

This is not a point against it. This is explicitly what people are asking for when they ask for field promotion (which they are asking for). It is true that for extremely long paths they might choose to bind a variable, but that doesn't change the fact that empirically, users are requesting the ability to promote paths, are happy for the limited cases that we are providing, and are asking us to provide support for more cases.

@lrhn

I got thoroughly lost in where you were going in your comment, but strictly looking at your summary:

My summary would be that:

  • If someone does a potentially promoting check on a "suitable" expression,

Sort of?

  • and they later, still dominated by the check, do "an operation" on the same expression that is "only allowed" if the value was promoted,
  • but we can't actully guarantee that the value of the expression doesn't change (aka: we can't promote using normal rules),
  • then we cache the value of the first expression read, and reuse the same value at the later use.

I don't understand all of your conditionality here. Delete the lines starting "and they" and "but we", and I agree with the summary. All of the rest is irrelevant to this proposal.

We don't need to cache if we can already soundly promote.

But we can't. That's the whole point!

We don't need to cache if a later operation doesn't require the promotion.

We might want to for other reasons, but yes, this is primarily aimed at promotion.

So only in the case where we would otherwise get an error.

I have no idea what you're trying to say here.

Do we allow this for any expression, or only for some expressions?

I answered this in my initial comment.

How do we decide if the promotion is "needed", whether there is only an error if we don't promote?

We don't. I'm not sure where this is coming from? Nothing in the proposal says anything about promotion being needed.

I'm really lost with the rest of your comment. I feel like maybe we got some wires crossed about what the proposed feature is?

lrhn commented 1 year ago

I feel like maybe we got some wires crossed about what the proposed feature is?

Ack, I completely missed the leading final and thought it was implicitly turning expressions into references to the implicit variable, if needed. No excuse, sloppy reading on my part. I'll remove my unrelated ramblings.

With the final expr != null test allowing expr later to be considered referring to the same value, this feature is closer to if-variables or declaration-promotion, except that it doesn't introduce a new named variable. It introduces a variable with no known name, and then treats the entire expr.id expression as denoting that variable if it occurs again in scope of the test.

The syntax precisely matches part of if-variables: if (final expr.id != null).../if (final expr.id is Foo)... in if variables would introduce a new variable implicitly named id, and perform the promotion on that. Declaration promotion would do that explicitly if ((final id = expr.id) != null).../if ((final id = expr.id) is Foo)....

So, this is a way to implicitly introduce a variable, and not give it a name. (Or saying that the "name" of a variable can be an entire expression, and the entire expression denotes that variable. Using plain identifier expression is just the simple case.)

It definitely works, in that it can be given a consistent and sound semantics

As you point out, it gets tricky to read if expr.id seems like it shouldn't mean the same thing any more. Or if it looks like it should have side-effects. A if (final o[++x] != null) o[++x].toRadixString(2); would be very surprising if the second o[++x] was just a variable name with no side-effect.

Restrict it to any getter on a stable receiver (in the field-promotion sense, where a final field getter would be promotable) seems reasonable. The final getter can still have side-effects. If we know that it cannot, the expression would likely be field-promotable already. I think it's OK to assume that we cache the value of that one call, but not as reasonable to assume that we cache an entire chain of member accesses up to it.

That brings it very close to if-variables with a longer name. If e is stable and has no side-effects, then if-variables would make if (final e.id != null) introduce a variable named id for that, this feature introduces a variable named e.id, but we know that e is basically constant and has no side-effects, so if we squint, we can see it as just a name. That would be much harder if e looks like it does things.

The one case where a stable receiver can change, is when it's only locally stable, because the root expression is a local variable. Field promotion can promote that, if it can see that the variable doesn't change between test and use. The safest is probably what you say, to disallow any occurrence of e.id at a point where a similar e.promotableField would no longer promote, because the root variable might have changed. It doesn't end the scope of the e.id variable, it just demotes it to unusable, effectively uninitialized, since the assigned value stopped being correct.

The bigger question then becomes where this test can occur. Is it only as the condition of an if expression, like if-variables. Because that was one of my main concerns there.

Can it be any expression? Only in conditions, or sub-condition? Can I do:

if (final e.x is Foo && final e.x.foo != null) {
   print("${e.x.bar} : ${e.x.foo.isValid});
}

Even the following can make sense:

var exists = (final e.x is Foo) && e.x.isNotEmpty;

The usual options are:

I dislike the last option for not being orthogonal enough, and probably have a hard time selling the first, so something like the middle one would be what I'd go with.

But I also think that a (slightly generalized) if-variable, or declaration-assignment, would be a generally better feature. I cringe when I see things like:

if (list[i] != null) foo(list[i]!);

and it's not just because of the !. There should either just be a local variable, not a second member access, or the bail-out should itself be in-line so we don't need to check first and use later.

What I really would want to write in this case is foo(?list[i]);, which evaluates list[i], and if that is null, it bails out of the entire expression (or up to the nearest null-aware operator or an explicit limiting scope).

That's a null check, a bail-out type assertion could be foo(list[i] as? int) or using an inline as-or-Null, foo(?list[i].as<int>).

A pipe operator could do list[i]?->foo();, where x->foo() means foo(x), and this is the null-aware version of that.

Or we can introduce a variable, and give users a good, short, inline syntax for that variable, rather than trying to make them repeat an entire expression, which still only works for some expressions (not for list[i] or map[k], the latter of which I know I see occasionally.)

I'm warming up to if-variables, if we can allow them in all condition positions.