dart-lang / language

Design of the Dart language
Other
2.65k stars 202 forks source link

Declaration expressions and declaration promotion #1420

Open lrhn opened 3 years ago

lrhn commented 3 years ago

Edit 2021-11-24 - Restructure, move "maybe we can also" parts into separate section.


This is inspired by #1091, #1201 and #1210, but is slightly different in approach/scope.

We have an issue with promotion of instance members (or any non-local variable in general). The #1091 approach is to do a binding at the promotion point. The #1201 approach is to introduce new variables in tests (but do it implicitly in some cases), and #1210 introduces new variables with a new syntax (and also potentially implicitly).

This is a proposal for two features which takes some of #1201 and #1210, but do not introduce any names implicitly, and where the binding can be used independently of the need to promote the variable.

Feature: Assignment Promotion

(Note: discussed in https://github.com/dart-lang/language/issues/1844)

First, allow a local variable assignment of the form id assignmentOp expression (potentially parenthesized) to act like the variable itself, when used in a test. We currently allow if (x != null) ... to promote x. This change would also allow if ((x = something) != null) ... to promote x.

It only affects the left-most variable of an assignment, so if ((x = y = something) != null) ... will only promote x, not both x and y (although that's also an option if we really want it - treat both x and the assigned expression as being tested)

If you do if ((x += 1) != null) ... that still works, but there aren't that many operators which return a nullable result, so the usefulness is limited.

This is a very small feature, but it allows you to do “field promotion” as:

int? c;
if ((c = this.capacity) != null) { ... use(c)... }

Feature: Variable declaration expression

Allow var x = e and final x = e as expressions. No types, only var and final. Must have an “initializer expression”.

The expression introduces a new variable named x in the current block scope, and it’s a compile-time error to refer to that variable prior to its declaration.

For statement-level declarations, the “prior to declaration” is anything prior to the source location of the declaration. For a declaration with an initializer expression, that’s equivalent to hoisting the declaration to the top of the current scope block, but keeping the assignment at the original declaration point, and saying you must not refer to the variable where it’s not definitely assigned.

These expression variables, which must have initializer expressions, has the same behavior:

For type inference, the context type of the declaration becomes the context type of the RHS, the static type of the RHS becomes the declared type of the variable.

The construct is an <expression>:

<expression> ::= (`var'|`final') <identifier> `=' <expression> | …

An <expressionsStatement> expression also cannot start with final or var, just like it currently cannot start with {.

The new constructs, being <expression>s, need to be parenthesized in most places, including before an is check, but also that it can contain any expression as a RHS, including a cascade.

Usage

This feature can introduce a variable at the first point where you need its value, rather than having to go back and declare it further up, even though that’s effectively what it does.

If you have an expression where a sub-expression is repeated twice, you can name it the first time, and then refer to that name the second time:

foo(v.property.name, v.property.value);

Can become:

foo((var p = v.property).name, p.value);

The variable declaration works anywhere there is a surrounding scope (which is any expression), even if you can’t normally have statement-level variable declarations there.

As opposed to let x = e1 in e2-like constructs, where the scope is only e2, the variable uses the same scoping as other local variables, which is “until the end of the current scope”. Because of that, it can also be used where the expressions do not share a common (or at least not close) parent expression, like lists (potentially deeply nested inside another expression):

var list = [1, 2, var x = compute(), 3, x];
fooWithArgumentList(1, 2, var x = compute(), 3, x);

or where we can’t technically have variable declarations, like initializer lists:

class C<T> {
  final StreamController<T> controller;
  final Stream<T> stream;
  C() : controller = (var c = StreamController()), stream = c.stream;
}

(which would currently be implemented using an extra helper constructor), or => function bodies:

BTree<T> buildDag<T>(int depth, T leafValue) => 
   depth == 0 
      ? BTree.leaf(leafValue) 
      : BTree.node(var dag = buildDag(depth - 1, leafValue), dag);

(where we would currently use a {} body to be able to declare the variable up-front.)

Promotion

The var id = e or final id = e counts as assignments for assignment promotion (above), so if ((var c = this.capacity) != null) { ... } would both read capacity into a local variable, then promote that local variable to non-null if possible.

This is the proposed solution to promoting non-local-variable expressions: Introduce a new local variable with the same value, then promote that, and do it in a single expression. Only, unlike #1191, the binding feature is generally useful and not restricted to checks or promotion, and it doesn't clash with a potential pattern syntax for is checks. The binding is not linked to the test, the features are orthogonal, and should therefore also work if we introduce more tests (like pattern matching) in the future.

Alternatives and similar features

There is no implicit assignment of a name to an expression, unlike #1201 and #1210. I personally found those hard to read. We can introduce those as well, so, for example, var foo.bar.fieldName as an expression would be equivalent to (var fieldName = foo.bar.fieldName). Basically: var selector.chain.last, not followed by = and in an expression position, is equivalent to var last = selector.chain.last (and similar for final).

Since this is based on the same logics as the current definite-assignment analysis, it can make variables available in only some continuation of the declaration.

For example:

if ((var y = this.y) != null && (var x = this.x) != null) {
  // `y` definitely assigned here, and not null.
  // `x` definitely assigned here, and not null
} else {
  // `y` definitely assigned here, may be null.
  // `x` not definitely assigned here.
}

Control flow statement scopes

Currently there is no special scope for the conditions of an if or while statement. The condition expression belongs to the surrounding scope. (Or rather, there is no way to tell since you cannot introduce variables inside the condition expression

For a for (;;) statement, there is a new scope introduced for the for statement itself, separate from the block scope of the body (it’s the parent scope of the body scope). The variables declared in the initializer part of the for (;;) loop are declared in that scope.

We could introduce a condition-scope for if and while statements, so that variable declarations in the test belongs only to that scope, and not the surrounding block scope.

Example:

if ((var x = this.x) != null) {
  doSomethingWith(x);  
}
// Should x be available here, unpromoted?

It would be consistent to introduce a wrapper scope for the control flow statement itself, like for for (;;). It’s not necessary, but it means that the variable belongs to the outer scope, and may conflict with other variables in that scope.

On the other hand, it also prevents constructs like:

if ((var x = this.x) == null) return;
// Use x as non-null.

which is a logical extension of the same pattern that we support for promoting local variables.

I’d recommend not introducing that scope.

That means that a variable unconditionally declared in test of while (test) { ... } is available in the body.

Feature: Shorter syntax for local declaration

Alternatively, maybe preferably, introduce x := e as an in-line final declaration, equivalent to the above final x = e, and do not introduce any way to declare non-final local variables.

That means that locally declared variables will all be final. I think that's a good thing. It prevents some use case, like:

 var list = [f(x := 0), f(++x), f(++x), f(++x)];

but I'm not sure such uses are really that essential.

If the local variable declaration expressions can only introduce final variables, it means closures over them can just capture the value, and not worry about getting the correct variable. (Also x := e; can still be used as an expression statement as a short declaration of a final local variable).

Examples (using := only).

class C {
  final int? nullable;
  C(this.nullable);
  String doSomerhing() {
    if ((n := nullable) != null) {  // Use to promote non-local variables.
      return n.toRadixString(16);
    }
    return "0"
  }
}

class Streamer<T> {
  final StreamController<T> _controller;
  final Stream<T> stream;
  // Introduce local variables in initializer list, avoids the "two-constructor-hack".
  Streamer([StreamController<T>? controller>])
      : _controller = c := controller ?? StreamController<T>(),
        _stream = c.stream;  

  // Two-constructor hack for introducing shared computed value.
  StreamController.hack([StreamController<T>? controller>]) 
      : this._hack(controller ?? StreamController<T>()); // Create and forward.
  StreamController._hack(this._controller) : stream = _controller.stream; // Use twice.
}

// Binary tree structure of nodes with two subtrees, and leaves with a value.
abstract class BTree<T> {
  Btree();
  factory Btree.leaf(T value) = BtreeLeaf<T>;
  factory Btree.node(Btree<T> left, BTree<T> right) = BtreeNode<T>;
  BTreeLeaf<T>? asLeaf() => null;
  BTreeNode<T>? asNode() => null;
}
class BTreeLeaf<T> extends Btree<T> {
  final T value;
  BTreeLeaf(this.value);
  BTreeLeaf<T>? asLeaf() => this;
}
class BTreeNode<T> extends Btree<T> {
  final BTree<T> left, right;
  BTreeNode(this.left, this.right);
  BTreeNode<T>? asNode() => this;
}

/// Builds single-width DAG of [depth] nodes and one leaf value.
BTree<T> buildDag<T>(int depth, T leafValue) => 
    // Is useful inside `=>` functions. 
    // Would otherwise require a block body and local variable, or a helper function.
    depth == 0 
        ? BTree.leaf(leafValue) 
        : BTree.node(dag := buildDag(depth - 1, leafValue), dag);

void main() {
  // Just use it as a plain final declaration inside a method body, 
  // but not outside of it (it's an expression).
  // No conflict with local `var`/`final` declarations, unlike `final x = 42`.
  x := 42;  

  // Works just like the variable itself when tested for promotion.
  if ((y := intOrNull()) != null) { ... y promoted to int ...}

  LinkedListNode<T> current = ...;
  // Can be used in loop conditions and seen into the body.
  while ((next := current.next) != null) {
    current = next;
  }
  // `current` is last element of in linked list.

  // Useful in collection comprehensions:
  var list = [
      for (var v in someElements) if ((p := v.some.property) != null) p  // Good!
  ];
  // Instead of having to repeat it:
  var list = [
      for (var v in someElements) if (v.some.property != null) v.some.property // Bad!
  ];
  // Or make a hack to bind a variable using `for`:
  var list = [
      for (var v in someElements) for (var p in [v.some.property]) if (p != null) p // Ugly!
  ];
}

Summary

This is really three features:

eernstg commented 3 years ago

I like this! I think it would be helpful to include the rule that every composite statement introduces a scope which is enclosing the entire composite statement and nothing else. For example:

void main() {
  if (b) {...} else {...}
  // works like:
  {
    if (b) {
      ...
    } else {
      ...
    }
  }
}

such that any variables introduced by b would be available in the entire if-statement (including the else part), but nowhere in the enclosing statement list. I think this amounts to a much more comprehensible scoping for such variables.

Compared to #1210, binding expressions, the main difference to a variable declaration expression is that the former uses : in order to disambiguate parsing, and the latter uses an existing declaration syntax, but presumably forces the use of parentheses in almost all contexts. The parentheses could be quite helpful from a readability point of view, so the added verbosity might not be a problem.

The implicit variant (where the name of the new variable is derived from the initializing expression in some way) is orthogonal: Both forms can easily omit that feature or include it, and it would be a non-breaking change to add it later.

lrhn commented 3 years ago

Thanks for the corrections.

Using with isn't really necessary, you can just do ...use(x:=xx, y:=yy)....

Generally the idea here is to not change the structure of expressions, but allow you to name existing sub-expressions, and then reuse them later. If you currently have foo(e1, e1), with a repeated expression, you can instead do foo(tmp := e1, tmp), which names the value the first time it occurs, then uses the name later to avoid computing the value.

With this, we can desugar the e1.x += e2 operation as ($tmp := e1).x = $tmp.x + e2 (where $tmp is fresh).

About only allowing :=, I mean that we should not allow ...(var x = 42)... or ...(final x = 42)... as expressions anyway, they'll stay declarations only. Only the declaration form ...(x := 42)... can be used as an expression, and that ensures that the variables introduced in the middle of an expression are always final. I think that makes some things easier. You can still write x := 42; as an expression statement, and use it as a shorthand for final x = 42;, because you can do expr; for (almost) any expression.

Cat-sushi commented 3 years ago

Can we hide the original field/ variable with new variable like use(xx := xx, yy := yy)?

lrhn commented 3 years ago

@Cat-sushi The scope would not support that. Local variables are added to the surrounding scope, even before their declaration. As such, the variable is in scope for its own initializer expression, you're just not allowed to use it yet. You'd have to write use(xx := this.xx, yy := this.yy) to make the distinction.

We could change the way scopes work in Dart, and introduce a new scope for every var x = y/x:=y declaration, covering everything after the initializer expression. That would allow xx := xx to work, because the new xx variable won't be in scope until after the xx expression has been evaluated. That's a quite significant change to how scopes work now, and it's a change that can be made independently of what we do here, so I don't want to include it in this feature. I don't think it's an implementation problem to do this. We currently have to detect whether a variable reference is valid and give an error if it's not. If we just continued lexical lookup when finding a not-yet-initialized value, instead of giving an error, it would simulate the variable not being in scope until it becomes valid. It's much harder to specify (but I guess it could be hacked the same way as the implementation, by keeping the current scope + when variable references are invalid, and then skipping past currently invalid variables during lookup.

lrhn commented 3 years ago

@eernstg About wrapping all composite statements/control flow structures in extra scopes, I'm not sure it's precisely what we want.

There is me wanting to be able to do something about the variable set in a failed test, so while ((next := current.next) != null && test(next.property)) { ... } would allow me to check what next was after the loop. That can perhaps be solved by giving loops an else branch which is inside the statement (#171).

Apart from that, it should be one scope per iteration in order to allow variable declarations in the test to be a new variable on each iteration. So a while loop would introduce a new scope, the "test scope", per iteration, in which the test is evaluated. Then the body is evaluated in the new "body scope", which has the "test scope" as parent scope. We could perhaps drop the "body scope" - if the body is a block statement, it gets a new scope anyway, if not, it can perhaps live in the "test scope", which would then be the "iteration scope".

For for (var i = 0; i < something; i++) body we need to keep the var i = 0 outside of the iteration scope since it's evaluated once, but i < something is inside the iteration scope. It's somewhat equivalent to:

{ // Outer scope
  var i = 0;
  loop: { // Iteration scope
    if (!(i < something)) break loop;
    $body: { // body scope (if necessary?)
      body[continue -> break $body]
    }
    i++;
    restart loop; // pseudocode.
  }
}

The for (var x in e) ... needs to keep e outside the iteration loop, but var x inside it.

So, I think it's possible to do something with scopes here, but it's more complicated than just wrapping every composite statement in an outer scope (although that too might be worth it, to contain variables ... as long as I get my else on loops).

Cat-sushi commented 3 years ago

@lrhn Thank you. I probably understand what you said. My motivation of the question was, thinking out new names is bothersome. Then, writing xx := this.xx is also bothersome. Do you have any good idea in this thread or another post?

lrhn commented 3 years ago

Fixed the "dag" to use := too. I admit that the var dag = ... does read well. I think the advantage of := is more that it forces the variable to be final, because otherwise I'd use var every time to save on writing (and reading, for that matter).

For try, you're already at the statement level, so putting a variable declaration before the try is just a normal variable declaration.

As described, there is no int x := 0 syntax. The grammar for the declaration expression is identifier `:=` expression. That's an expression (likely with the same precedence as other assignments, but not just a plain assignmentExpression because it only allows a single identifier as LHS), and there is no use of := anywhere else in the grammar. You can't write int x := 1; or var x:= 1; any more than you can write int x += 1; or var x += 1;.

And yes, restricting to only allowing := as inline expression declaration is an attempt to not have too many ways to do the same thing, and to force all such inline declarations to be final.

lrhn commented 3 years ago

Good. What about void foo([x:=0]) {...}? Allowed?

That's not an expression, so no.

As for try... the symmetry is there, you can declare variables inside expressions, and try doesn't have a leading expression. (Just like while { ... } do (test); has the test last, so declaring variables there won't help much).

eernstg commented 3 years ago

@lrhn wrote:

There is me wanting to be able to do something about the variable set in a failed test, so while ((next := current.next) != null && test(next.property)) { ... } would allow me to check what next was after the loop.

The reason why I'd recommend limiting the scope of the variable next in this example to the while statement (so you can't access it after the final }) is readability: I think it's a source of confusion if next is in scope for the next 30-or-so lines of code, but it is introduced by a construct which is visually difficult to find if you're looking at a usage of next 20 lines after the end of the loop.

If you want the variable to be available after the loop then you'd simply use an old-fashioned local variable declaration outside the while statement.

Cat-sushi commented 3 years ago

@tatumizer Using same name should have special meaning, so lastName := this.lastName is OK for me. I'm looking for even easier way to do so. And you can access this.lastNmae even after lastName := this.lastName.

Cat-sushi commented 3 years ago

It is for (var i = 0; i < 10; i++) print(i); that the exception is, which is already confusing. for (var i = 0; i < 10; i++) print(i); print(i); // error

lrhn commented 3 years ago

The biggest argument (to me) against prolonging the lifetime of variables declared inside a construct is what it would do to existing variables in the surrounding scope.

int i = 0;
while (i := iteration.next()) {
  .. something(i);
}
use(i); // <--- Which `i`.

Here the "Which i?" question should be resoundingly answered by the int i;. Anything else is crazy-talk.

So, if the while does not introduce a new scope, then the i := iteration.next() would be an error because i is already declared in the same scope. If the while introduces a new scope, then the variable should not leak from that scope. Since for already does introduce a new scope for the loop variables, it would be consistent to do the same for the rest of the loops, and then the rest of the composite statements too, for more consistency.

All in all, I think I agree with Erik that all constructs should introduce a new scope. If I want to use the variable after the loop, I'll just have to introduce an else branch as part of the loop:

while(test(i := next()) {
  use(i);
} else {
  discard(i);
}

(So #171, please!)

Levi-Lesches commented 3 years ago

Yes. This covers all bases except break.

Well, if you're breaking then you're still in the scope of the loop, so you still have access to the loop variable

while (test(i := next())) {
  if (should_break(i)) {
    on_break(i);
    break;
  }
  on_loop(i);
} else {
  on_end(i);
}
Levi-Lesches commented 3 years ago

Right, that was your complaint -- that you would lose access to i when you use break. But what about right before you break, in on_break? Since breaking is almost always done in if statements, you could simple use i there.


if (should_break(i)) {
  on_break(i);
  break;
}
Levi-Lesches commented 3 years ago

It seems there are many proposals for such a feature -- #1191, #1201, #1210, #1514, and this one, #1420, to name a few -- that all do essentially the same thing. Can these be consolidated into one proposal that's more concrete on the syntax? @leafpetersen, I noticed you posted the same challenge in many of these -- do you have any anecdotal thoughts/preferences?

leafpetersen commented 3 years ago

@Levi-Lesches we've budgeted time in the upcoming quarter to work on getting a consolidated proposal + syntax. I don't want to prejudge more than that.

Levi-Lesches commented 3 years ago

It seems the problem is the short-circuiting -- is it possible to separate what gets evaluated at runtime from what is statically analyzed at compile time?

Like, this code is invalid:

int a = 5;
bool b = true;
String c = a < 42 || b < 30 ? "yes" : "no";

since b is known to be a bool ahead of time, even though the second half of the || is never evaluated. So in your example:

if ((x := a) > 0) || (y := b) > 0)

the compile can (and should, IMO) define x and y, even if it never actually evaluated b.

Levi-Lesches commented 3 years ago

This would go against the very definition of short-circuiting operations.

Yeah, I don't know why I didn't see that.

But then, we are back to the (much more narrow) concept of "if-vars" IMO.

It seems like it. At a cursory glance, #1191 and #1210 also suffer from this. I'd be in favor of a merge between #1201 and #1514, where you would write:

if (shadow maybeNull != null) {
  use(maybeNull);
}

which would also allow you to use shadow outside of ifs as well:

shadow maybeNull;  // a field in a class
bool isValid = someOtherCondition && maybeNull != null;  // refers to the local version
print(maybeNull);  // refers to the local version

EDIT: Using a shadow in an if statement can suffer from the same problem with ||, &&, ??, etc. So maybe we'd have to use shadow by itself, in which case it's not really an "if-var" at all.

lrhn commented 3 years ago

You can't use them in chains like a?.method(x:=expr)

You can, but only down-chain:

a?.method(x := expr, foo(x)).otherMethod(x.length);

It's no different from declaring the variable earlier in the function, and only being able to rely on it having a value after the assignment:

List<int> x;
a?.method(x = [1, 2, 3], foo(x)).otherMethod(x.length);

This code is currently valid. The assignment makes the variable definitely assigned, but only down-stream from the assignment (only on code dominated by the assignment). If you use x before the = [1, 2, 3] or after the a?... code, it's still not definitely assigned.

So, we already do all the computation needed to figure out where the variable can be used, and you hadn't even noticed. I think that's an argument for the behavior being predictable enough that people can understand it.

You can just treat x := expr as an assignment which promotes an already existing variable from uninitialized to usable. (Or similarly with var x = expr and final x = expr as an expression, only the former of those allows assignment in other, later expressions too).

lrhn commented 3 years ago

About (x := a) > 0) || (y := b) > 0;, the only variable visible after this statement would be x, but if you use it as a test expression:

if ((x := a) > 0 || (y := b) > 0) {
  // x is available here, maybe > 0.
} else {
  // x *and* y available here, neither greater than 0.
}

then the y becomes available on the else branch. The expression is entirely equivalent to !((x := a) <= 0 && (y := b) <= 0) by De Morgan's law, so if you swap the branches you get:

if ((x := a) <= 0 && (y := b) <= 0) {
  // x *and* y available here, neither greater than 0.
} else {
  // x is available here, maybe > 0.
}

which doesn't look that surprising to me.

Levi-Lesches commented 3 years ago

All these comments make me wonder why we didn't stick with good ol' final var x = a; and leave it at that 😄

In fact, I think that allowing declaration expressions can encourage messy code, just as these examples illustrate. I'd rather have to read

final bool complex1 = a || b && !c && (d || a);
final bool complex2 = b && a || !c || d;
if (complex1 || complex2) { /* ... */ }

than have to see

if ((complex1 := a || b && !c && (d || a)) || (complex2 := b && a || !c || d)) {
  /* ... */
}

and option 1 allows/encourages for descriptive variable names, whereas simply including the variables inline in option 2 make it more difficult to read.

Levi-Lesches commented 3 years ago

late is specifically for when the dev can tell the compiler "it's okay, I know what I'm doing". Having the compiler do that automatically feels like trouble. In general, the compiler shouldn't allow you to access a variable if it doesn't exist (again, late is a manual workaround -- exception, not the rule).

Levi-Lesches commented 3 years ago

but their practical value is unknown.

Well, promotion is nice

Levi-Lesches commented 3 years ago
  1. If you want a shadow with another name, then just use final. It's more to the point and exactly equivalent. shadow is when the field name is perfectly good and you want to keep it.
  2. I'm in favor of keeping shadow a statement rather than an expression. The following are really equivalent:

    class Counter {
    int? count; 
    void incrementShadow() {
    shadow count;
    if (count == null) count = 0;
    count++;
    } 
    
    // A shadow simply "wraps" the local context with its own local variable
    void incrementRegular() {
    final int _count = this.count;
    if (_count == null) _count = 0;
    _count++;
    this.count = _count;
    }
    }

    Both the declaration and saving the value back to this.count are statements, so I don't see why shadow shouldn't be one as well. Plus, it makes the implementation so much easier, as you won't run into all the context issues posed by this proposal (and the others like it).

stereotype441 commented 3 years ago

I've split off "Assignment Promotion" to its own issue (https://github.com/dart-lang/language/issues/1844) so I could tag it with the "flow-analysis" label.

leafpetersen commented 2 years ago

I'm generally supportive of the idea of adding a let binding form to the language, but I'm extremely skeptical of this specific proposal. Concretely, I find the motivation for the implicit scope extrusion weak, and more generally, I believe that this feature makes code extremely difficult to read and write reliably.

Concrete

Starting with the concrete, let's consider each of the examples from the original proposal in comparison to a normal expression scoped let form (for which I will use the strawman syntactic form var x = E in E end where E is a meta-variable representing expressions.

Example 1

foo(v.property.name, v.property.value);

With this proposal:

foo((var p = v.property).name, p.value);

With general let:

let var p = v.property in foo(p.name, p.value) end;

This is, to me, vastly more readable in the second form.

Example 2

print(1 + (var o = 2) + o * 2); 

With general let

print(let var o = 2 in 1 + o + o*2);

Again, the second form is more readable to me.

Example 3

C() : _controller = (var c = StreamController()), stream = c.stream;

This isn't handled by a general let. Allowing local variable declarations in initializer lists does solve it, however, and in a much more readable way:

C() : var c = StreamController(),  _controller = c, stream = c.stream;

Example 4

BTree<T> buildDag<T>(int depth, T leafValue) => 
   depth == 0 ? BTree.leaf(leafValue) : BTree.node(var dag = buildDag(depth - 1, leafValue), dag);

Becomes:

BTree<T> buildDag<T>(int depth, T leafValue) => 
   depth == 0 ? BTree.leaf(leafValue) : var dag = buildDag(depth - 1, leafValue) in BTree.node(dag, dag) end;

Again, vastly more readable to me.

Example 5

T firstWhere(Iterable<T> element) {
  var it = elements.iterator;
  while (it.moveNext() ? !test(var value = it.current) : throw StateError("no element"));
  return value;
}

There is no way to write this code with general let. I think I consider that a feature, not a flaw. :)

The remainder of the examples I believe focus on using this as a mechanism to bind and promote variables inline in if statements in order to make working with nullable fields easier. A general let mechanism doesn't really help with that. For that specific problem there are other alternatives.

Generalizing

Given that this does in fact solve the general let problem (albeit I would claim in an inferior manner), and also helps with the problem of field promotion, perhaps then it is worth considering for the sum of the use cases? My deeper problem with this proposal is that I believe it makes code deeply unreadable. Lexical scoping is one of the most fundamental building blocks of modern programming languages, and this proposal breaks all of the intuitions that users have built up based upon many decades of experience with lexical scoping. How does a user know where scope begins and ends? How does a user understand to what outer scope bindings are "hoisted"? When do nested variables shadow each other? The answers to these questions depend not at all on syntactic properties of the program, but rather on semantic properties (essentially, as I understand it, dominance in the control flow graph). This, to me, is a recipe for immense user confusion. Here's a few examples of code which either scopes oddly or introduces collisions that I honestly don't know whether would be allowed or not by this proposal (and for which either answer is unsatisfying):

class C {
  int? x null;
  void test() {
    {
      print({var x = 3 : x, x : x}); // What does this print?  How do I explain to a user why it prints what it prints?
      print(x); // What does this print?
    }
    {
      print( { if(var x = 3) x : x, x : x); // What does this print?  How do I explain to a user why it prints what it prints?
      print(x); // What does this print?
    }
    {
      // Is this an error?  
      // If so, this makes the feature inferior to general let, which allows encapsulating local variable names.
      // If not.... WAT?
      print({var x = 3 : x, var x = 4: x}); 
      print(x); // What does this print?
    }
   {
      print(<Object>["Here's a really long list of things", 
                                 "It has stuff in it", 
                                   ConstructorCall(text: "It has constructor calls too",
                                                               build : () {
                                                                   print("Maybe even lambdas");
                                                               },
                                                               stuff: "Also other stuff",
                                                              },
                                   "A really long thing" + "Other things" + foo<int>(NestedCall(Something(var x = "Look a squirrel"))),
                                   {"Config" : 3,
                                    "Fooble" : var y = 4,
                                     "I can do this all day", 6
                                   }
                                 ]);
      // Did you see the assignment above?  Would you see it in a code review?  
      // Could you explain to a user why this doesn't print `null`?
      print(x); // WAT? 
  }
  {
     print(var y = 3 ?? var x = 4);
     print(y); // prints 3 
     print(x); // prints null.  Wait, what?
  }
}

Summary

To summarize, it seems to me that this proposal:

I'm open to being convinced otherwise, but as it stands, I'm quite opposed to this approach.

lrhn commented 2 years ago

I'll have to agree to disagree on the readability. I find let var x = someExpression in something(x, other(x)) less readable than something(var x = someExpression, other(x)). Not just because it's longer (I find let/in annoyingly verbose), but also because all those spaces and keywords makes it harder to parse visually.

I remember some language showing me recursive data something like:

 ["a", "b", @1=["d", "e", @1]]

That's the same idea: Name the value the first time it occurs and the allow reusing it, rather than name the value first, separately, and then use the name every time.

I do agree that scoping being implicit gives it some potentially sharp edges.

As for scoping, what I would prefer is that every variable declaration introduces a scope starting at its declaration and ending at the end of the surrounding block/structure (I say "structure" because I want the while of a do/while to be included if possible, so each control flow structure may need to define what its scope structure is). That would change the current behavior where a variable exists prior to its declaration, in the entire block, and allow something like var x = x; instead of var x = this.x; (It looks odd now, but I think it would very quickly become idiomatic for introducing a local variable for an outside declaration).

A var x expression would be considered as in-scope for everything after it in the current block. It's then a matter of whether it's initialize or not. It's almost like hoisting the variables to the block, like we do now, except that it doesn't shadow anything prior to the syntactic variable declaration. That will then be consistent with "statement" declarations: Any time you see var foo it introduces a new variable foo which is in scope until the end of the block. (I'd really prefer to only be able to introduce final variables, which is why I recommend the x := e syntax instead, so you can get final variables without writing final.)

It also becomes an error to introduce the same variable name more than once in the same block.

With that, the scope examples above becomes:

class C {
  int? x null;
  void test() {
    {
      print({var x = 3 : x, x : x}); // prints {3: 3}. The `var x = 3` introduces a variable that lasts until `}`.
      print(x); // Prints 3
    }
    {
      print( { if(var x = 3) x : x, x : x); // Compile-time error, condition is not a boolean?
      print(x); 
    }
    {
      print({var x = 3 : x, var x = 4: x}); // Compile-time error, `x` already declared in this scope.
      print(x);
    }
   {
      print(<Object>["Here's a really long list of things", 
                                 "It has stuff in it", 
                                   ConstructorCall(text: "It has constructor calls too",
                                                               build : () {
                                                                   print("Maybe even lambdas");
                                                               },
                                                               stuff: "Also other stuff",
                                                              },
                                   "A really long thing" + "Other things" + foo<int>(NestedCall(Something(var x = "Look a squirrel"))),
                                   {"Config" : 3,
                                    "Fooble" : var y = 4,
                                     "I can do this all day", 6
                                   }
                                 ]);
      // Did you see the assignment above?  Would you see it in a code review?  
      // Could you explain to a user why this doesn't print `null`?
      print(x); // Yes, there is a definitely executed `var x` prior to this code in the same block.
                   // It's always possible to write obscure code.
  }
  {
     print(var y = 3 ?? var x = 4);
     print(y); // prints 3 
     print(x); // Would be a compile-time error, `x` refers to the prior `var x = 4`, but is not definitely assigned.
  }
}

It's always possible to write obscure code. I'm not particularly worried about people doing that to themselves, as long as they can avoid it if they want to. I don't want to force people to do something unreadable, but I also don't think it's necessarily disqualifying for a language feature that you can do something obscure with it.

eernstg commented 2 years ago

We have discussed the addition of an extra scope around every composite statement (if, switch, loops, etc.), which would limit the scope of a new variable introduced in the condition of an if to that same if (including an else part, if present), and similarly for switch and for the loops.

I think that's crucial! (.. and I included it in #1210.) If we don't have it then said variable could be in scope any number of lines after the end of the if statement, and that's surely going to cause the kind of problems that @leafpetersen mentioned:

this feature makes code extremely difficult to read and write reliably. ...

...
              "A really long thing" + "Other things" + foo<int>(NestedCall(Something(var x = "Look a squirrel"))),
...

But I'm not so worried about the unusual scoping as long as the primary use case (as far as I can see) will be the introduction of a new variable in a condition of a composite statement. As @lrhn wrote,

It's always possible to write obscure code.

but I do think that it is dangerous to have a scoping behavior that allows these in-expression variable introductions to survive far beyond the, say, if statement that introduces them. This will give rise to code obscurity which is not intended and probably goes unnoticed until it creates some kind of surprising behavior (or, worse, a run-time bug that is not detected for a long time).

Of course, we could have a lint that flags any new variable which is introduced in this manner if it occurs at a column which is higher than 20. And we could ask IDEs to make the variable name blink. ;-)

leafpetersen commented 2 years ago

@lrhn

I'll have to agree to disagree on the readability.

Yes, we really, really will have to disagree here. For me to be at all open to this proposal, I would need substantial empirical evidence to support the readability of this, because I find it completely unreadable. Moreover, I would point out that JS function scope variables have a similar property, and as best I can tell, ES6 added block scope for a reason. Finally, I accept that you find let expressions unreadable but... there are many, many programming languages which have some form of standard scoped let expressions and programmers seem to find them quite usable. So I would say that the preponderance of evidence right now is in one direction... :)

It's always possible to write obscure code. I'm not particularly worried about people doing that to themselves, as long as they can avoid it if they want to. I don't want to force people to do something unreadable, but I also don't think it's necessarily disqualifying for a language feature that you can do something obscure with it.

As far as I'm concerned, it's impossible not to write obscure code with this feature. That is, all of the examples are obscure to me. I'm just pointing out that there are no upper bounds on the obscurity that's possible.

I remember some language showing me recursive data something like:

 ["a", "b", @1=["d", "e", @1]]

That's the same idea: Name the value the first time it occurs and the allow reusing it, rather than name the value first, separately, and then use the name every time.

No, that isn't the same idea. That's a simple recursive let with standard scoping. ["a", "b", let rec x = ["d", "e", x]] or what have you - you can do this in, e.g. Haskell, and there's no funky scoping things going. This feature is about things like: [["a", "b", @1=["d", "e"]], @1], where scope no longer follows the syntactic structure.

{
      print({var x = 3 : x, var x = 4: x}); // Compile-time error, `x` already declared in this scope.
      print(x);
    }

This is a usability footgun for this feature, and is one of the many reasons that expression let is superior. This means that variable names escape their scope and collide, so you can't re-use simple names in local scopes. Instead, in order to use the feature, you need to invent new names because the scope ends up being larger than you want:

  // Works
  Pair<int, int> sumOfSquares(Pair<int, int> fst, Pair<int, int> snd) {
    return Pair(let sum = fst.x + fst.y in sum * sum, 
                let sum = snd.x + snd.y in sum * sum)

 // Doesn't work
  Pair<int, int> sumOfSquares(Pair<int, int> fst, Pair<int, int> snd) {
    return Pair((var sum = fst.x + fst.y)  * sum, 
               (var sum = snd.x + snd.y) * sum)
}

@eernstg

I think that's crucial! (.. and I included it in #1210.) If we don't have it then said variable could be in scope any number of lines after the end of the if statement, and that's surely going to cause the kind of problems that @leafpetersen mentioned:

This is, to be clear, one of my most significant concerns with those proposals, and you may notice that there are substantial concerns about this raised in the issue by other people as well. I accept that it may be necessary, and I think in this very limited form it might be ok, but it is still a point against these proposals.

lrhn commented 2 years ago

I'm not proposing that variable declarations escape their containing block - or "structure", if we want declarations inside if conditions to be visible in the branches. We could (and I did say that), but we probably should not. I agree that's confusing.

That's JavaScript did was much worse, it made the variables function scoped, not block scoped. Example:

function foo() {
  x = 2;
  y = 3;
  if (x == y) {
    var y;
  }
  var x;
}

The var x and var y declarations declare the variables in the entire function, prior to their declaration. The ES6 let construct is still block scoped, and the scope still goes back to the start of the block:

function foo() {
  let x = 42;
  { 
    console.log(x); //ReferenceError: Cannot access 'x' before initialization
    let x = 10;
  }
}

The var declarations proposed here can escape their containing expression, which a let can't - and that's a reason let won't help to solve the field promotion problem. Every var x = 42; in a Dart block can be seen later in the same block. It's not any new magic that allows (var x = 42) in an expression to be visible after the declaration, inside the same block.

That's what I'm proposing: essentially to allow you to write var x = e as an expression, but make it equivalent to x = e with a declaration of x at the beginning of the current scope, and then make it a compile-time error to access x where it's not definitely assigned. That's basically just hoisting the variable declaration for you, but otherwise generally equivalent to the current Dart variable declarations, and it uses the same scoping as every other local variable declaration (scoped to current block).

It does mean you cannot reuse the same name later in the same scope (unless we actually allow shadowing inside the scope, but that can get weird very quickly). And I can see how that's annoying, like the sum example, it's not new.

The confusion may be based on a prior assumption about what var sum = ... means, not how it's actually defined. In:

Pair<int, int> sumOfSquares(Pair<int, int> fst, Pair<int, int> snd) {
    return Pair((var sum = fst.x + fst.y)  * sum, 
               (var sum = snd.x + snd.y) * sum)

the two sum variables conflict. No disagreement, and that is annoying. Naming is hard, m'kay. They also conflict if you expand it to:

     var sum = fst.x + fst.y;
     var prod1 = sum * sum;
     var sum = snd.x + snd.y;
     var prod2 = sum * sum;
     return Pair(prod1, prod2);

The second one is not surprising, because obviously we are declaring the same name twice in the same scope.

I'm not sure why the prior example was not just as obvious, unless someone was already assuming let semantics where the declaration does not covert the rest of the block. That's just not how var declarations work elsewhere in Dart. If you tell people, from the start, that var x = ... introduces a variable, and the variable is available in the rest of the current scope (like all other local variable declarations), maybe they will also find it obvious that the two var sum = declarations conflict. And maybe they'll rewrite it to:

Pair<int, int> sumOfSquares(Pair<int, int> fst, Pair<int, int> snd) {
    return Pair((var sum = fst.x + fst.y)  * sum, 
               (sum = snd.x + snd.y) * sum)  // no `var`
}

just like they likely would the expanded version.

Escaping the current expression, unlike a let construct, is also the one thing that allows it to be directly usable in situations where there is no immediate parent expressions:

var list = [1, 2, var x = compute(), x, 4];
Constructor() : controller = (var c = StreamController), stream = c.stream;

In both situations, we can't just insert a let construct around the occurrences of the variable.

We can instead introduce a let as a collection element (and then still need a spread: let x = computer() in ...[x, x]), and we can allow variable declarations as initializer list entries. Each is a new, specialized, functionality. That's why I prefer just allowing an expression to declare a variable that spans beyond that expression (but still limited by the current block/structure).

I'll update the original post to make scoping more precise (and drop the "maybe we can also" parts, or at least move them to a separate section).


I'm willing to work towards variables not being in scope before their declaration, so you can write var x = x;, but that'd be for every variable declaration, not just local ones. It's a separate feature. Currently, variables are in scope inside the entire scope they're declared in.

jodinathan commented 2 years ago

@Cat-sushi

My motivation of the question was, thinking out new names is bothersome. Then, writing xx := this.xx is also bothersome. Do you have any good idea in this thread or another post?

There is the Binding Expressions proposal that you could do:

if (obj.@prop != null)
  print(prop); // prop is local and promoted to non null