Finalize details of initialization based promotion.

dart-lang / language

Design of the Dart language

Other

2.65k stars 202 forks source link

Finalize details of initialization based promotion. #946

Closed leafpetersen closed 4 years ago

leafpetersen commented 4 years ago

The flow analysis specification, based on discussion here includes support for promotion based on initialization assignments. Specifically:

if a value of type T is assigned to a variable x
x is definitely unassigned and not definitely assigned
x was declared without an explicit type, and with no initialization expression
then x may be promoted to T subject to the usual restrictions around capture etc.

In discussion, the language team still feels that this is a valuable feature. In discussion, several related questions were raised.

1) Is it an error to use such a variable when it is or may not be initialized?

Is it an error to use it when it is not definitely assigned?
Is it an error to use it when it is definitely unassigned? 2) If 1) is not an error, then what type does the variable have at the pre-initialization use point?
dynamic
Object?
Null
Something else 3) May such a variable be demoted from its promoted type, and if so, to what type is it demoted?
dynamic
Object? 4) Is it an error to initialize such a variable at two unrelated types? 5) If we choose to make 1) an error, should such an error apply to all variables declared with no initializer, regardless of nullability and type?

Examples:

void test1And2(bool b) {
   var x;
   if (b) {
     // `x` is definitely unassigned.  Is this an error?  If not, what type do we treat x as having.
     var l = x; 
     x = 3; // Promotes x to int
   }
    // `x` is not definitely unassigned, but it is not definitely assigned.  Is this an error?
   // If not, what type does 'x' have.
   var a = x;
}
void test3(bool b) {
    var x;
    if (b) {
      x = 3; // Promotes x to int
   }
   // What is the type of x here?  Demote to dynamic?  Demote to Object??
  var y;
  y = 3; // y is promoted to int;
  y = "hello";  // Is this legal?  Does it demote y to dynamic?  To Object?
}

void test4(bool b)  {
    var x;
    if (b) {
       x = 3;
       x.isEven; // Ok, x is promoted to int.
    } else { 
       x = "hello";
       x.length;  // Ok, x is promoted to String
   }
   // Is there an error at the join point?  Or do we demote?
   var l = x; // If we demote, to what type do we demote?  dynamic?  Object??
    x.arglbargle;  // Legal if we treat this as demotion to dynamic, do we want this?
}

void test5(bool b) {
    String? x;
    if (b) {
        print(x); // Is this an error?  x is definitely unassigned.
        x = "hello";
    }
    print(x); // Is this an error? x is not definitely assigned.
}

cc @lrhn @eernstg @munificent @scheglov @stereotype441 @johnniwinther

leafpetersen commented 4 years ago

For what it's worth, I think my preferred way to think about this is that the var x; (or final x;) form should be used as a way of writing code that in other languages you might write by putting a large conditional expression or let expression to the RHS of the variable declaration. But in Dart, the encouraged style is imperative/statement based. So this feature provides a good way to give many of the same benefits of the expression based approach, while also getting the benefits of the statement based approach (notably less nesting). As a concrete example, I'm suggesting we should think about var x = (b) ? e1 : e2 and var x; if (b) x = e1; else x = e2; as having the same general user intent, and therefore similar behaviors.

If I start from that, I get the following answers to the above:

1) It should:

Definitely be an error to use such a variable if it is definitely unassigned.
- With the expression metaphor in mind, such a use corresponds to violations of the metaphor:
  - var x = let y = x in 3 // Use when definitely unassigned
Also probably be an error if it is not definitely assigned
- This is also a violation of the metaphor:
  - var x = if (b) 3; x; // Use when not definitely assigned 2) Is not relevant if we accept my answer to 1) 3) If we start from the expression metaphor, the answer should probably be that there is no "original type". That is, the initializing expressions defines the initializer for the variable, and that is what fills in the missing type.
Given var x = (b) ? 3 : 4; x = "hello"; we don't allow x to be "demoted" to Object at the assignment, so if we treat var x; if (b) x = 3; else x = 4; as another way of writing the same code, we should not subsequently allow demotion of x in the latter case either. 4) For the question of whether we demote at join points if the assignments don't match:
- The expression analogy of var x = (b) ? 3 : "hello" and var x; if (b) x = 3; else x = "hello"; being different ways of writing the same thing would imply that this should be allowed, and that x should have type UP(int, String) after the conditional.
- Using upper bound in general has not worked out well for us, as it is poorly designed.
- This leads me to the tentative conclusion that we should not allow this: that it should just be an error.
- This leads me to the strong conclusion that we should not choose dynamic as the type we demote to, if we do in fact demote. 5) The same "expression" metaphor can be applied to type variables. I don't feel super strongly about this one though, since typed nullable variables fit fairly well into other patterns.

In summary then, my initial proposal is:

It is an error to use a variable declared var x; of final x; before it is definitely assigned.
Once a variable so declared has been promoted to a type T this type is treated as its "declared type" for the purposes of further demotion, and hence the variable may never be demoted "above" its initialization type.
- As a follow on consequence of the above, there is not type to demote to at join points if the promoted type does not match, and hence it is an error to initialize such a variable to two differently typed values on different paths.

Thoughts?

lrhn commented 4 years ago

Very good thoughts.

I fear it's a little more than we can get away with, though.

So, a declaration of the form var x; or final x; will not infer a type since it has no initializer to infer it from. As a new thing it has no type, rather than defaulting to dynamic. It will be considered definitely unassigned.

On assignment, a variable with no type considers all types as being of interest and promotes to the assigned type. It now has that type. That's its minimal known type along that branch, there is nothing above that type that it can be demoted to. (So no to 3., the variable cannot be demoted).

So, question 1: I want to make it an error to read the variable while it's still not definitely assigned, but I don't think it's possible. It will likely be a problem for loosely written script files which are used to the variable just being dynamic? Like:

var x;
if (something) x = "option";
return foo(x: x);

I don't think we can make that an error now. I wish that we could, but I fear it's too breaking. Requiring someone to write var x = null; is not better, because that would infer Null as the type of x. They would have to write String? x (and maybe = null too). I think that's too cumbersome in practice. We can probably still get away with making it an error to use the variable if it's definitely unassigned (because then it's just silly code).

(For a final variable, I would make it an error to read the variable uninitialized. The ability to assign to a final variable is completely new, so we don't need to worry about legacy code).

So, in practice: Yes to error if definitely unassigned, no to error if potentially assigned.

That brings us to 2: If it's not an error to read x in the example just above, what is its type? It can contain a string or be unassigned. If it's unassigned and readable, then the value is null (as it always were, which is what we can't break). So the type has to be String? or dynamic in order to keep the code working. Let's avoid dynamic.

That is, any unassigned path joined with an assigned path provides "make the type nullable" as its contribution. Since null is going to be the value, it's a necessary modification of the type. Which means that joining the two branches needs to do some kind of unification of String and "make nullable". That could be LUB(String, Null), but it could also be a special-cased combination of the actual type and then adding a ?.

So, what is the type after joining, say, branches with types num, int, double and "make nullable"?

Compile-time error if the types are not all the same. Probably too restrictive.
the maximum of the assigned types (which must exist), or Object? if it doesn't (poor man's LUB). Can probably make it Object if none of the types are nullable and there is no unassigned branch.
The LUB of the types + Null for unassigned branches (which has all the usual issues).

I think that "same type" is too strict, and we always get into problems when defining which types are the "same type". Rather not go there.

The "maximum type" doesn't work well in practice (it's not associative because it's a partial function: MAX(MAX(num, int), double) is num, MAX(num, MAX(int, double)) is undefined). So, trying to reinvent LUB needs something better.

The LUB has issues, but it's also what users are used to. It's what you get from var x = test ? a : b; (with all the complications that has caused over the years, mostly because of the combination with implicit downcasts).

Take a simple example:

var x;
if (useDoubles) { 
  x = 0.0;
} else {
  x = 0;
}

Users may assume this infers num. It's not even unreasonable. There is no max type, and they don't have the same type, but the code looks right. So, no to 4.

So, I think LUB/UP is the correct choice, even with all its inherent issues. I hope they are fewer now that we don't have implicit downcast.

So, in summary:

Is it an error to use such a variable when it is or may not be initialized?

Is it an error to use it when it is not definitely assigned?

Is it an error to use it when it is definitely unassigned?

It's an error to read the variable if it is definitely unassigned or it's final and potentially unassigned.

If 1) is not an error, then what type does the variable have at the pre-initialization use point?

It has type Null along the non-initialized branch, which is combined into a nullable type before being readable. It has the value null until assigned.

For final variables, 1) is an error, so no type is needed until definitely assigned along disjoint branches.

May such a variable be demoted from its promoted type, and if so, to what type is it demoted?

No demotion above its first assigned type. That is its most general known type, there is no type of interest above that to demote to. When merged with other branches, the combined type may become a supertype of the individual assigned types, but that's not a demotion. This uses the LUB of the branches, with Null contributed by branches where the variable is definitely unassigned. (If it's definitely unassigned along all edges, it still has no type).

This ensures that the uninitialized branch doesn't get silently ignored. The type becomes nullable, which is immediately obvious to someone who tries to use the variable. If it's OK that it's nullable, then the code just works.

Is it an error to initialize such a variable at two unrelated types?

No. That would probably be too cumbersome in practice, and it breaks the var x = test ? a : b; parallel.

If we choose to make 1) an error, should such an error apply to all variables declared with no initializer, regardless of nullability and type?

Probably no. I wouldn't mind, but we have style guides telling people not to write = null as initializer, so selling String? s = null; is going to be hard. (I'd be happy to do it, but I don't think we have the capacity to take that fight at this time).

leafpetersen commented 4 years ago

Take a simple example:
var x;
if (useDoubles) { 
  x = 0.0;
} else {
  x = 0;
}
Users may assume this infers num. It's not even unreasonable. There is no max type, and they don't have the same type, but the code looks right. So, no to 4.

Note that we don't use LUB in promotion/demotion anywhere else, so this is both new work, and also somewhat inconsistent:

 var x;
 dynamic y;
 if (useDoubles) { 
   x = 0.0;
  if (y is! double) {
   y = 0.0;
  } // y promoted to double
 } else {
   x = 0;
   if (y is! int) {
     y = 0;
   } // y promoted to int
 }
// x is "demoted" to num
// y is "demoted" to dynamic

There are also a lot of questions raised about when to apply LUB.

class A {}; class B extends A {}; class C extends B {};

 var x;
 if (something) { 
   x = B(); // promote to B
 } else {
   x = A(); // promote to A
   if (x is! C) {
     x = C(); // promote to C
   }
}
// is x demoted to LUB(B, A) or LUB(B, C)?

I suspect that you can also make examples with switches for which the order that you apply the LUB matters, by forcing a different path through the multiple inheritance graph.

To summarize what I'm hearing so far:

Definite assignment errors

We both think that it should be an error to use a variable declared final x before it is definitely assigned
We both think that it should be an error to use a variable declared var x when it is definitely unassigned.
You feel that making it an error to use a variable declared var x before it is definitely assigned is too breaking and/or a bad idea
You feel that making it an error to use a variable declared T x for a nullable T when it is definitely unassigned is too breaking and/or a bad idea

Demoted type

You would like to use the LUB of something at join points
- As a strawman, LUB(S0, LUB(S1, ...)) where Si is the type to which x was promoted to by initialization on branch i, Null if still definitely unassigned.
I am strongly opposed to using dynamic as the fall back
I am weakly in favor of either making it an error to initialize to different types, or demoting to Object?.

Does that cover it?

lrhn commented 4 years ago

I think that covers it well, yes.

It's "too breaking", not "bad idea" in both cases. I wouldn't mind switching to any un-initialized variable being, well, definitly uninitialized. That'd require migration of existing code (only code that adds ?, so code that is already being migrated), and a change of mindset. I think the latter is too late in the NNBD process.

leafpetersen commented 4 years ago

We discussed this in the language meeting today, some notes for further discussion follow.

The discussion points were the following:

There is a concern that making var x; if (b) x = 3; use(x); an error is too breaking
- If not an error, what is the type of x in use(x)?
  - Also consider: var x; if (b) x = use(x); use(x);
- What about final x; if (b) x = 3; use(x);?
  - Also consider final x; if (b) x = use(x); use(x);
There is a concern that making int? x; use(x); if (b) x = 3; use(x); an error is too breaking or a bad idea.
- There was mostly consensus that making this an error forbids useful code, maybe a step too far.
- What about final int? x; if (b) x = 3; use(x); ?
  - This seems less attractive, but it feels somewhat inconsistent to make this an error.
There is a concern that users will expect to be able to initialize at different types and have it work: var x; if (b) x = 0.0; else x = 0; to make x have type num

There is a general desire to have as much uniformity of behavior as possible - changing a variable from explicitly typed to implicitly typed, var to final, etc should preferably not deeply change the behavior.

In so far as the behavior is different, there is a desire to have the difference be clear in the static analysis, rather than invisible (i.e. falling over to dynamic).

There is a general readability concern.

Code should be readable
Every variable should in some way have a clear visible type, either via initializers or via an explicit type.

stereotype441 commented 4 years ago

We should also think about how these proposed rules affect the migration tool. One of the goals of the migration tool is to produce code that doesn't have compile-time errors, so we need to make sure the tool makes whatever changes are necessary to avoid triggering these new errors.

My current thinking is that the migration tool should probably change any local var x declaration (without an initializer) to dynamic x, since that avoids most of the errors we're discussing here, and preserves the types that were present prior to migration.

Regarding the idea about making it an error to use a variable declared T x for a nullable T when it is definitely unassigned: if we decided to do this, the migration tool would need to detect such situations using flow analysis and do something to avoid the error. I'm thinking it could probably just add an = null initializer and that would preserve the pre-NNBD semantics.

munificent commented 4 years ago

I spent a bunch of time working through all of the different combinations of declaration style (var, final, typed nullable, typed non-nullable), flow state (definitely unassigned, definitely assigned, potentially either), and operation (read, and write) to figure out which ones I think should be allowed or disallowed. I felt like I got off in the weeds, so instead, maybe it's better to focus on the variable's type and see what behavior flows naturally from that.

Here is a proposal:

Let the initializing assignments of a local variable be all of the assignments to it that occur in positions where the variable is definitely unassigned. So, in:

test() {
  var f;
  if (c) {
    f = 1;
  } else {
    f = 2;
    f = 3;
  }
  f = 4;
}

The initializing assignments are f = 1 and f = 2, but not f = 3 and f = 4. In:

test() {
  var f = 1;
  f = 2;
}

The initializing assignment is just the one on the declaration, f = 1.

Given that, the static type of an unannotated local variable is the LUB of the types of all of its initializing assignments. Taking the LUB of a bunch of expressions scattered through the function sounds spooky, but I hope that the fact that we only include definitely unassigned ones limits the potential for cycles. So in the above examples, f has type int. Here:

test() {
  var f;
  if (c) {
    f = 1;
  } else {
    f = 2.3;
  }
}

We infer num.

The rule for which operations are allowed (in addition to the existing rules about assigning to final variables), is: it is a compile-time error to access a non-nullable local variable when it is not definitely assigned. It's not an error for nullable locals because we default initialize those to null, the same way we do for fields.

Is that sufficient? Walking through the original examples:

void test1And2(bool b) {
  var x; // Infer int from initializing assignment (A).
  if (b) {
    var l = x; // Error: Accessing non-nullable when not definitely assigned.
    x = 3; // (A). No promotion here, x is already int.
  }
  var a = x; // Error: Accessing non-nullable when not definitely assigned.
}

void test3(bool b) {
  var x; // Infer int from initializing assignment (A).
  if (b) {
    x = 3; // (A). No promotion here, x is already int.
  }
  // No demotion, still int.
  var y; // Infer int from initializing assignment (B).
  y = 3; // (B). No promotion, y is already int.
  y = "hello"; // Error: Can't assign string to int.
}

void test4(bool b) {
  var x; // Infer Object from LUB of int (A) and String (B).
  if (b) {
    x = 3; // (A). Promote Object to int.
    x.isEven; // OK, x is promoted to int.
  } else {
    x = "hello"; // (B). Promote Object to String.
    x.length; // OK, x is promoted to String.
  }
  // Demote x back to Object.
  var l = x; // Infer Object for l from x.
  x.arglbargle; // Error: No arglbargle on Object.
}

void test5(bool b) {
  String? x; // Use declared type.
  if (b) {
    print(x); // Not error because x is nullable. Initialized with null.
    x = "hello"; // Promote to String.
  }
  // Demote back to String?
  print(x); // No error.
}

Thoughts?

leafpetersen commented 4 years ago

@munificent What would you propose that the types of x and l be in the following program:

void test1And2(bool b) {
   var x;
   if (b) {
     var l = x; 
     x = l; 
   } else {
     x = null;  
  }
}

lrhn commented 4 years ago

I guess Bob's approach requires that reading a definitely unassigned variable is an error, so you can't do var l = x;. That ensures that an initializing assignment (variable is definitely unassigned before) cannot refer to the variable itself. That also brings symmetry between var and final. So, to summarize:

Bob's approach;

For a local variable declaration with no declared type (var or final-with-no-type) and no initializing expression (precludes const):
An initializing assignment is an assignment to the variable at a position where it's definitely unassigned.
The type of the variable is inferred as the LUB of the initializing assignments.
It's a compile-time error to read the variable where it's definitely unassigned (which prevents reading prior to initializing assignment).

That means allows:

String? z = ...;
var x;
if (something) {
  // x definitely unassigned.
  x = z;  // Only initializing assignment to x.
  // x definitely assigned
}
// x potentially assigned/unassigned.
print(x);  // x has type String?, so it's implicitly intialized to null, and this access is OK.

How about:

Iterable<String?> zs = ...;
var x;
while (z in zs) {
  // x potentially assigned/unassigned.
  x = z;  // Not an initializing assignment to x.
  // x definitely assigned
}
// x potentially assigned/unassigned.
print(x);

What is the type of x. This is a case with no initializing assignments. I can do one where x is definitely assigned, but with no initializing assignments too: We also need to figure out what the type is if there is no initializing assignment.

var x;
do {
  // x is potentially assigned.
  if (someTest) { 
    x = 42;
  } else {
    x = 37;
  }
} while (someOtherTest);
print(x); // x is definitely assigned.

Here x is definitely assigned at the print, but there is no initializing assignment. What is the type of x?

It's convoluted code.

It does suggest a variant on Bob's approach where we use potentially unassigned assignments:

Not Bob's approach;

For a var or final-with-no-type local variable declaration:
An initializing assignment is an assignment to the variable at a position where it's potentially unassigned.
The type of the variable is inferred as the LUB of the initializing assignments.
It's a compile-time error to read the variable where it's potentially unassigned (which prevents reading prior to initializing assignment).

This would disallow:

String? z = ...;
var x;
if (something) {
  // x definitely unassigned.
  x = z;  // Only initializing assignment to x.
  // x definitely assigned
}
// x potentially assigned/unassigned.
print(x);  // x is inferred as String?, but it's potentially unassigned so it can't be read.

because it's reading x while potentially unassigned.

It would allow

var x;
do {
  // x is potentially assigned.
  if (someTest) { 
    x = 42;
  } else {
    x = 37;
  }
} while (someOtherTest);
print(x); // x is definitely assigned.

because both assignments are at potentially unassigned positions, and so we infer LUB(int,int) for x, then we read it at a point where x is definitely assigned.

(For a final variable, there is no difference between the two approaches because you can't assign to a potentially assigned final variable anyway).

lrhn commented 4 years ago

Going back to my initial reason for this feature, it was to avoid having to write types on variables even if they are initialized a little later.

The original inspiration was:

StreamSubscripton<int> subscription;
subscription = streamOfInt.listen(... something something subscription.cancel() ...);

That code is no longer valid, not even with the declared type. The use of subscription in subscription.cancel() is potentially unassigned (it's capture while the variable is not definitely assigned, so it may be called at any later point). The variable is non-nullable, so reading it while potentially unassigned is not allowed. So, that code is not a good example for this feature any more.

The cases that do make sense is where you declare a variable prior to initializing it because you will definitely assign something to it, but you will do so inside a nested statement (likely an if), and need the variable to survive that scope.

The canonical version of that is:

String key;
String value;
if (someTest) {  // Cannot be written as conditional expression because one test guards two values.
  key = key1; 
  value = value1;
} else {
  key = key2;
  value = value2;
}

I use two variables to preclude using a conditional expression as the initializer. Another reason could be that you need to do statement based computation in the branches.

The values still have the same type, which is also not necessarily general.

Widget widget;
if (horizontal) {
  state.direction = Direction.horizontal;
  widget = HorizontalListWidget(data);
} else {
  state.direction = Direction.vertical;
  widget = VerticalListWidget(data);
}

In these cases, I'd like to avoid writing the type.

It's generally situations where you can almost rewrite it to var x = test ? e1 : e2;, but you need either more than one value, or you need to do something more in the branch than what an expression can handle.

(So, if we had tuples and statement/block expressions, we'd probably be fine, then you could compute all the variable initializations on each branch and join them at the end).

I want rules to match this. That means to use LUB as the join because that's what a conditional expression does. I like Bob's idea of inferring the type from the assignments rather than "promoting from nothing". It's consistent in that we always infer a type for a var x, either from the initializer or from the assignments.

So, which assignments count and what happens on branches where there is no assignment?

var x;
x = e;  
// Should work exactly as: var x = e;`

var x;
if (test) {
  x = e1;
} else {
  x = e2;
}
// Should work exactly as `var x = test ? e1 : e2;`.

var x;
if (test) {
  x = e;
}
if (x == null) ...
// Doesn't  have to work. You can add an `else { x = null; }` if you want to.

I think I'd like var x = null; to not infer the type of x from the initializer, but do assignment based type inference and include Null in the LUB.

So, proposal:

Assignment based type inference applies to non-const local variables with no declared type (so var or final-with-no-type), and either no initializer expression or an initializer expression with static type Null.

An initializing expression is an assignment to the variable at a point where there has potentially been no assignment to the variable since its declaration. (Aka. potentially unassigned if you ignore the =null initializer).

If the variable has no =null initializer, then it's a compile-time error to read it while it's potentially uninitialized. If it has a =null initializer, it starts out promoted to the type Null. The variable can be read.

The type of the variable is inferred as the LUB if the static types of the initializing expressions, plus Null if the variable has a =null initializer.

An initializing assignment will always promote to its assigned type. (We didn't have to say this, but it's probably more useful than staying at the LUB).

So, Bob's examples are unchanged. Leaf's example:

void test1And2(bool b) {
   var x;  // No initializer.
   if (b) {
     var l = x; // Reading while potentially unassigned -> error.
     x = l; 
   } else {
     x = null;  
  }
}

If you write it with an =null initializer, you can force x to be nullable:

void test1And2(bool b) {
   var x = null; // infer Null as LUB(Null (here), Null (A), Null (B)).
   if (b) {
     var l = x; // x promoted to Null, l type Null
     x = l;  // (A) Initializing x to Null.
   } else {
     x = null;  // (B) Initializing x to Null
  }
  // x has type Null.
}

We could also allow you to write var? x; to force x to be nullable, but not say which type it's nullable of. Then use assignment inference to figure out the type. That's prettier than special-casing = null;. So, it'd be var, var?, final, final?, final type or type (where the type can itself be nullable).

It can be used for direct initialzation too:

var? x = "ab";  // x has type String?.

Really helps if you currently need to do:

StreamSubscription<Map<String, List<String>>>? subscription = stream.listen(...);
// would become:
var? subscription = stream.listen(...);

I think that's actually a useful feature by itself, but it can interact with assignment based initialization too.

leafpetersen commented 4 years ago

For a var or final-with-no-type local variable declaration:

An initializing assignment is an assignment to the variable at a position where it's potentially unassigned.

The type of the variable is inferred as the LUB of the initializing assignments.

It's a compile-time error to read the variable where it's potentially unassigned (which prevents reading prior to initializing assignment).

Realistically, all of these (Bob and not-Bob) are non-starters given our current approaches. They all require a fix-point computation over the AST, since in order to figure out the type of a variable x you have to visit all of the nodes to find the initializers for x. But the initializers for x may refer to y and vice versa. So now you have to go back and infer the type of y, which is the same problem... repeat. Example:

var x;
var y;
var z;
if (b) {
   x = 3; // Initializing assignment at int
   y = x;  // Initializing assignment at the type eventually inferred for x
} else {
   if (otherb) {
      z = 3;   // Initializing assignment at int
      y = z;  // Initializing assignment at the type eventually inferred for z 
  } else {
      y = 3; // Initializing assignment at int
      z = y; // Initializing assignment at the type eventually inferred for y 
 }
 x = z; // Initializing assignment at the type eventually inferred for z 
}

leafpetersen commented 4 years ago

I like Bob's idea of inferring the type from the assignments rather than "promoting from nothing". It's consistent in that we always infer a type for a var x, either from the initializer or from the assignments.

The problem with this, as I mention in the previous comment, is that the type of the initializer is not well-defined independently of the type of x. I don't realistically see a way to avoid the promotion based approach without completely changing our approach to inference, which isn't going to happen on this time frame.

Assignment based type inference applies to non-const local variables with no declared type (so var or final-with-no-type), and either no initializer expression or an initializer expression with static type Null.

So just to clarify, you intend this code to be legal?

final x = null;
if (b) {
   x  =3;
}

munificent commented 4 years ago

We could also allow you to write var? x; to force x to be nullable, but not say which type it's nullable of.

I have had that exact syntax idea in mind of for a while too. I'm not sure if I like it yet, but the fact that it occurred to both of us a positive sign.

Realistically, all of these (Bob and not-Bob) are non-starters given our current approaches. They all require a fix-point computation over the AST, since in order to figure out the type of a variable x you have to visit all of the nodes to find the initializers for x. But the initializers for x may refer to y and vice versa.

You're right. I was hoping that my proposal would dodge the need for a fixed-point computation by only using the definitely unassigned positions as initializers. I wanted to statically prevent cycles from occurring at all. But as you note, definite assignment tracking isn't sufficient.

I think fields generally provide the rules and guarantees users want:

A non-nullable field is definitely initialized before it is read.
A final field cannot be reassigned.
The type of an unannotated field is inferred from its initializer.
A nullable mutable field is implicitly initialized to null.

Fields enforce those rules using constructor initialization lists. Fields can be annoying to work with because the rules are draconian. (That's basically why we have factory constructors.) So the question is can we give users those same guarantees with locals? The simple answer is "yes":

Any local variable that is final, non-nullable, or unannotated must have an initializer.

Done! I think that's basically what the original NNBD behavior required.

The question here then is are there any simple extensions we can come up with that provide the same guarantees we get from fields but aren't that strict? The two rules I proposed don't work. Making them work using some kind of fixpoint computation certainly fails "simple".

One option is to take my proposal and add a third rule:

The static type of an unannotated local variable is the LUB of the types of all of its initializing assignments.
It is a compile-time error to access a non-nullable local variable when it is not definitely assigned.
(new) It is a compile-time error if an initializing assignment depends on the local variable.

This would make Leaf's example an error:

void test1And2(bool b) {
   var x;
   if (b) {
     var l = x; 
     x = l; // <- Error: initializing assignment "= l" depends on x.
   } else {
     x = null;  
  }
}

What does "depends on" mean? I'm not precisely sure, but we already have it in the language. From the spec:

It is a compile-time error if the value of a constant expression depends on itself.

I've never seen a definition for "depends on" in that sentence, but it seems to be meaningful enough for us to use it for many years.

There is also this rule:

It is a compile-time error if a local variable is referenced at a source code location that is before the end of its initializing expression, if any, and otherwise before the declaring occurrence of the identifier which names the variable.

This is a nice textually precise way of making it an error for a local variable to refer to itself in its own initializer. Conceptually, that's the same goal we're trying to achieve while allowing a local variable to be initialized in multiple places.

Personally, my feeling is that if we can't come up with simple extensions to let users do var x; and final x; and initialize it later, then we're better off falling back to the draconian but at least easy to understand rule:

Any local variable that is final, non-nullable, or unannotated must have an initializer.

leafpetersen commented 4 years ago

Personally, my feeling is that if we can't come up with simple extensions to let users do var x; and final x; and initialize it later, then we're better off falling back to the draconian but at least easy to understand rule:

Any local variable that is final, non-nullable, or unannotated must have an initializer.

Do you feel that my initial proposal isn't simple?

One of the things that I liked about it was that it was, in fact, very simple.

It is an error to use a variable declared var x; of final x; before it is definitely assigned.

Once a variable so declared has been promoted to a type T this type is treated as its "declared type" for the purposes of further demotion, and hence the variable may never be demoted "above" its initialization type.

As a follow on consequence of the above, there is not type to demote to at join points if the promoted type does not match, and hence it is an error to initialize such a variable to two differently typed values on different paths.

Some principles that are leading me to the above:

I think it's really bad if we allow final variables with multiple assignments on any path
- I'm ok with the the idea that final x; tells me to look for an assignment on every path, but not with the possibility that there are paths in which x can be observed as having different values at different points.
I think it's unpleasant to allow reads of final variables when there has been no assignment.
- It's error prone: I see one (and only one) assignment to a final variable, I should be able to assume that the variable has that value. If there's a hidden assignment of null, that feels surprising.

As a consequence of the above, I think it should be an error to use a final variable before it is definitely assigned, and an error to assign to a final variable if it is potentially assigned.

The one point of difference seems to be that there seems to be a general desire to allow initialization at different types, and to compute the LUB from there. The above is easily adjusted to allow for this:

Option 1

It is an error to use a variable declared var x; of final x; before it is definitely assigned.
It is an error to assign to a variable declared final x; if it is not definitely unassigned.
An initializing assignment of a variable promotes the variable to the (current) type of the initializer
- An initializing assignment is any assignment when the variable is definitely unassigned.
At a join point, a variable so promoted is demoted (if necessary) to the LUB of all of the types of the initialing writes.

This satisfies your 1, 2, and 3 properties of fields.

A non-nullable field is definitely initialized before it is read.

A final field cannot be reassigned.

The type of an unannotated field is inferred from its initializer.

A nullable mutable field is implicitly initialized to null.

If we feel that your property 4 is important, we modify as follows:

Option 2

It is an error to use a variable declared final x; before it is definitely assigned.
It is an error to use a variable declared var x; when it is definitely unassigned.
It is an error to assign to a variable declared final x; if it is not definitely unassigned.
An initializing assignment of a variable promotes the variable to the (current) type of the initializer
- An initializing assignment is any assignment when the variable is definitely unassigned.
At a join point, a variable so promoted is demoted (if necessary) to the LUB of all of the types of the initialing writes, plus Null if it is not definitely assigned on any path.

Or, slightly more aggressive on the inference:

Option 3

Same as Option 2, except that:

An initializing assignment is any assignment when the variable is not definitely assigned.

Examples:

void test1And2(bool b) {
   var x;
   final y;
   if (b) {
     var l = x;   // `x` is definitely unassigned.  Error in all 3 options.
     x = 3; // Promotes x to int
     y = 3; // Promotes y to int
   }
}

void test3(bool b) {
    var x;
    final y;
    if (b) {
      x = 3; // Promotes x to int
      y = 3;
   } // x and y have type int in option 1, int? in option 2 and 3
   var a = x; // Error in option 1, ok with type `int?` in options 2 and 3
   var b = y; // Error in all three options.
  if (otherb) {
     x = 3; // Ok in all options
     y = 3; // Error in all options
  }
  x  = 3.0;  // Error in option 1 and option 2, promotes to num? in option 3. 
  y = 3.0; // Error in all options
  x = "hello"; // Error in all options, x is already definitely assigned, so no LUB demotion
}

void test4(bool b)  {
    var x;
    final y;
    if (b) {
       x = 3;
       y = 3;
       x.isEven; // Ok, x is promoted to int.
       y.isEven; // Ok, y is promoted to int.
    } else { 
       x = "hello";
       y = "hello";
       x.length;  // Ok, x is promoted to String
       y.length;  // Ok, y is promoted to String
   }
   var a = x; // Ok in all options, x has type Object
   var a = y; // Ok in all options, y has type Object
   x = 3.0; // Ok in all options, x has type Object
   y = 3.0 // Error in all options, y is final
   x = null; // Error in all options, x has type Object
}

munificent commented 4 years ago

Any local variable that is final, non-nullable, or unannotated must have an initializer.

Do you feel that my initial proposal isn't simple?

Sorry, I got a little lost in the weeds and forgot the initial proposal. (It's been one of those weeks.) Yes, I do like your proposal and I do think it is simple.

Some principles that are leading me to the above:

I think it's really bad if we allow final variables with multiple assignments on any path

I'm ok with the the idea that final x; tells me to look for an assignment on every path, but not with the possibility that there are paths in which x can be observed as having different values at different points.

I think it's unpleasant to allow reads of final variables when there has been no assignment.

It's error prone: I see one (and only one) assignment to a final variable, I should be able to assume that the variable has that value. If there's a hidden assignment of null, that feels surprising.

I agree with those principles too. They line up well with the rules I listed for fields.

The one point of difference seems to be that there seems to be a general desire to allow initialization at different types, and to compute the LUB from there.

I don't know if I would say that I desire using LUB, just that it was an option I wanted to consider. On further thought, I think I like your original proposal of it being an error for the types to be different. Your proposal is a little unclear on how annotated variables are handled, and I couldn't find anything in the flow analysis spec about them. Is this an error?

test1() {
  String s; // Is this allowed?
  if (c) {
    s = "a";
  } else {
    s = "b";
  }
  print(s); // OK?
}

test2() 
  String s; // Never initialized, never used. OK?
}

test3() 
  String? s; // Annotated nullable, used without initialization.
  print(s); // OK?
}

I think these questions are important because if we make it an error to initialize to different types, then the solution should be to annotate the local variable. Doing that should still give us behavior we like. In particular, even if you annotate with a non-nullable type, I think you should be able to still use the deferred initialization.

If you annotate with a nullable type, I think it's somewhat nice to default initialize to null.

Let me try to fill in some additional corners of your initial proposal. These rules should cover yours and more:

A non-final variable annotated with a nullable type gets an implicit initializer of = null if it has none. This implies that it is immediately treated as definitely assigned.

test() {
  String? a;
  print(a); // OK.

  final String? b;
  print(b); // Error, finals must be explcitly initialized.

  String c;
  print(c); // Error, not nullable.

  var d;
  print(d); // Error, type was inferred, not annotated.
  d = a;
}

It is an error to use a variable before it is definitely assigned.

test() {
  var a;
  use(a); // Error.
  if (cond) a = "s";
  use(a); // Error.
  a = "s";
  use(a); // OK.

  final b;
  use(b); // Error.
  if (cond) b = "s";
  use(b); // Error.

  String c;
  use(c); // Error.
  c = "s";

  String d;
  if (cond) d = "s";
  use(d); // Error, only potentially assigned.

  String? e;
  if (cond) e = "s";
  use(e); // OK. Definitely assigned because of implicit `= null`.
}

It is an error to assign to a final variable that is definitely or potentially assigned.

test() {
  final a;
  a = "init";
  a = "assign"; // Error, definitely assigned.

  final b;
  if (cond) {
    b = "init";
  } else {
    b = "init";
  }
  b = "assign"; // Error, definitely assigned.

  final c;
  if (cond) {
    c = "init";
  }
  c = "assign"; // Error, potentially assigned.

  final String d;
  d = "init";
  d = "assign"; // Error, definitely assigned. Applies to annotated too.

  final String? e;
  e = "s"; // OK, no implicit `= null` initializer on declaration.
}

If a variable has no type annotation and no initializer, its type is inferred from the types of the assignments that occur in positions where the variable is definitely unassigned. It is an error if all of those initializing expressions do not have the same type. (Since the variable cannot be used before reaching at least one of those expressions, no fixpoint analysis should be needed. Just treat its declared type as the type of the initializer when you first reach that. Then at then at end, check that all of the initializers agreed on their type.)
```
test() {
  var a;
  if (cond) {
    a = "str";
  } else {
    a = "ing";
  }
  use(a); // OK, has type String.

  final b;
  if (cond) {
    b = "str";
  } else {
    b = "ing";
  }
  use(b); // OK, has type String.

  final c;
  if (cond) {
    c = "str";
  } else {
    c = 123;
  }
  // Error: Types do not agree.
  // Error even though never used after or before join.

  final Object d;
  if (cond) {
    d = "str";
  } else {
    d = 123;
  }
  use(d); // OK: Annotated. Definitely assigned by here.
}
```

leafpetersen commented 4 years ago

I don't know if I would say that I desire using LUB, just that it was an option I wanted to consider. On further thought, I think I like your original proposal of it being an error for the types to be different.

I'll admit to being torn on this.

Your proposal is a little unclear on how annotated variables are handled, and I couldn't find anything in the flow analysis spec about them.

Yes, this issue was filed about promotion of unannotated variables, but I think it's clear we can't think about them in isolation. We need a consistent package.

My current thinking is:

If you annotate a variable with a non-nullable type
- It must be definitely assigned before being read, whether final or not.
- If final, it must be definitely unassigned when written.
If you annotate a final variable with a nullable type
- It must be definitely assigned before being read, whether final or not (no implicit initializer)
- It must be definitely unassigned when written

I don't know what the right answer is yet for non-final nullable variables. There is one approach which makes them consistent with final variables:

If you annotate a non-final variable with a nullable type
- It must be definitely assigned before being read, whether final or not (no implicit initializer)

There's another approach which seems more in line with how nullable variables are used:

If you annotate a non-final variable with a nullable type, it gets an implicit initializing write of null at the declaration site.

The latter is your suggestion. It's less consistent, but the desire for consistency here may just be a foolish hobgoblin.

test1() {
  String s; // Is this allowed?
  if (c) {
    s = "a";
  } else {
    s = "b";
  }
  print(s); // OK?
}

test2() 
  String s; // Never initialized, never used. OK?
}

test3() 
  String? s; // Annotated nullable, used without initialization.
  print(s); // OK?
}

I think test1 and test2 should be ok. We could outlaw test2, but that feels unnecessary to me. The analyzer will hint, but if you really want it, knock yourself out.

I'm on the fence about test3.

A non-final variable annotated with a nullable type gets an implicit initializer of = null if it has none. This implies that it is immediately treated as definitely assigned.
test() {
String? a;
print(a); // OK.

final String? b;
print(b); // Error, finals must be explcitly initialized.

String c;
print(c); // Error, not nullable.

var d;
print(d); // Error, type was inferred, not annotated.
d = a;
}

This all seems reasonable to me. The asymmetry between final and mutable is like a trick mirror - it looks good or bad depending on the angle I look from... :)

It is an error to use a variable before it is definitely assigned.

test() {
var a;
use(a); // Error.
if (cond) a = "s";
use(a); // Error.
a = "s";
use(a); // OK.

final b;
use(b); // Error.
if (cond) b = "s";
use(b); // Error.

String c;
use(c); // Error.
c = "s";

String d;
if (cond) d = "s";
use(d); // Error, only potentially assigned.

String? e;
if (cond) e = "s";
use(e); // OK. Definitely assigned because of implicit `= null`.
}

All seems reasonable to me.

test() {
  final a;
  a = "init";
  a = "assign"; // Error, definitely assigned.

  final b;
  if (cond) {
    b = "init";
  } else {
    b = "init";
  }
  b = "assign"; // Error, definitely assigned.

  final c;
  if (cond) {
    c = "init";
  }
  c = "assign"; // Error, potentially assigned.

  final String d;
  d = "init";
  d = "assign"; // Error, definitely assigned. Applies to annotated too.

  final String? e;
  e = "s"; // OK, no implicit `= null` initializer on declaration.
}

This all seems good to me.

If a variable has no type annotation and no initializer, its type is inferred from the types of the assignments that occur in positions where the variable is definitely unassigned. It is an error if all of those initializing expressions do not have the same type. (Since the variable cannot be used before reaching at least one of those expressions, no fixpoint analysis should be needed. Just treat its declared type as the type of the initializer when you first reach that. Then at then at end, check that all of the initializers agreed on their type.)
test() {
var a;
if (cond) {
  a = "str";
} else {
  a = "ing";
}
use(a); // OK, has type String.

final b;
if (cond) {
  b = "str";
} else {
  b = "ing";
}
use(b); // OK, has type String.

final c;
if (cond) {
  c = "str";
} else {
  c = 123;
}
// Error: Types do not agree.
// Error even though never used after or before join.

final Object d;
if (cond) {
  d = "str";
} else {
  d = 123;
}
use(d); // OK: Annotated. Definitely assigned by here.
}

This seems reasonable. The examples that give me a small pause are ones like the following:

void firstTry() {
  var x;
  if (cond) {
    x = 3;
  } 
  use(x); // Error
} 

void secondTry() {
  var x;
  if (cond) {
    x = 3;
  } else {
    x = null;
  } 
  use(x); // Still error
} 

void thirdTry() {
  int? x;
  if (cond) {
    x = 3;
  } 
  use(x); // Ok
}

I can see a user being a bit frustrated going through those steps. It feels "reasonable" to not assign to x on all paths and get null. It also feels reasonable to try to fix the first error by assigning null on the missing paths.

munificent commented 4 years ago

There's another approach which seems more in line with how nullable variables are used:

If you annotate a non-final variable with a nullable type, it gets an implicit initializing write of null at the declaration site.

The latter is your suggestion. It's less consistent, but the desire for consistency here may just be a foolish hobgoblin.

It's consistent with fields, though, which is the main reason I suggested it. I think people do generally like the implicit null initialization of fields. I think they'll like it even more under NNBD when that only happens for fields that are deliberately nullable.

I think test1 and test2 should be ok. We could outlaw test2, but that feels unnecessary to me. The analyzer will hint, but if you really want it, knock yourself out.

Agreed. The really odd one is:

test() {
  final x;
  // Never used.

  final int x;
  // Also never used.
}

You'll get unused variable warnings, so it's probably OK for this to be allowed, but it feels weird to me to have a final variable that never gets initialized.

Looking at:

void secondTry() {
  var x;
  if (cond) {
    x = 3;
  } else {
    x = null;
  } 
  use(x); // Still error
}

That is an interesting one. This is a case where doing LUB of the initializers would do what the user wants. You'd infer int? for x. It's tempting to suggest a special rule = null initializers that gives you LUB-like behavior just for that case. But then we'd be tempted to do another special rule for int literals and doubles and...

Maybe it's safest to just not special case this and try to have good error messages to guide people towards the thirdTry() form.

lrhn commented 4 years ago

We never want var x; to mean dynamic x; because implicit dynamic is bad. The migration tool should convert all existing var x; to dynamic x; for backwards compatibility, so we don't need to worry about existing code. All var x; in the future will be new code.

So let's say that the new var x; is definitely unassigned.

We already prevent reading potentially unassigned variables unless they are late. That would mean that reading x requires it to be definitely assigned (well, unless it's late).

If all assignments promoted, we wouldn't be considering which assignments would count as initializing assignments. That wouldn't be the issue then, only what the initial type of the variable would be, and how to join two branches. So let's just worry about that for a moment.

Cases that should work include:

var x;
if (test) {
  x = 0;
} else {
  x = 1;
}
x.toRadixString(10);

and

var x;
if (test) {
 x = 0;
} else {
 x = "no";
}
print(x);

That is, initializing to the same type definitely preserves the type on join. Definitely initializing at least promotes to Object? (you can read the variable if it's definitely assigned).

Then we can get into clever cases like (A :> B :> C):

var x;
if (test) {
 x = a;
 if (x is! C) x = C.fromA(x);
} else {
 x = b;
 if (x is! C) x = C.fromB(x);
}

Here the type of x after the if should be C. It's not clear whether there is something above C in its promotion chain, and if so, what.

So, let's just say that a var x; variable has an empty type promotion chain and it's definitely unassigned. Most variables have their declared type at the top of the chain. These do not.

Any assignment to a variable with an empty type chain promotes to the assigned type (and the assigned type is considered a type of interest, if that matters). Any read of a non-late variable with an empty type chain is invalid. A late variable is treated as if it has type Object? (if it's been assigned, then it's with an object, otherwise it throws).

How do we join two such chains? The plain intersection would make my second example above give an empty chain, which means that the variable can't be used even if it has definitely been assigned.

So, let's define the join of two type chains, c1 = t1::r1 and c2 = t2::r2, as follows:

If either chain is empty, the result is the empty chain.
Otherwise the result is UP(t1, t2)::intersect(r1, r2).

For a variable with a declared or inferred (from initializer) type, the top of the chain (t1, t2) will always be that type, so it's just plain intersection of chains.

A case like the diamond A :> B :> D and A :> C :> D:

So, in short:

A local variable declaration of var x or final x with no type or initializer has an empty type chain. It can be late. It is definitely unassigned.
Initialization based promotion applies to a variable with an empty type chain which is potentially unassigned.
The type of reading a variable with an empty type chain is Object?. The only case where such a variable can be read is when it's late and potentially assigned.
Joining type chains c1 and c2 is:
- if c1 or c2 is empty, the result is an empty chain.
- Otherwise c1 == t1::r1 and c2 == t2::r2 and the result is UP(t1, t2)::intersect(r1, r2).

An instance variable with no type or initializer inherits its type from its superinterfaces, if any, and if there are no declarations with that name in the superinterfaces, it just has type Object?. It cannot be affected by promotion or definite assignment analysis. All non-local, non-instance variables with no type or initializer have type Object?. (Including parameters).

All existing rules for assignment to finals still apply: No assigning unless the variable is definitely unassigned. No reading unless it's definitely assigned. (Unless it's late, then it's no reading if it's definitely unassigned and no writing when it's definitely assigned).

Normal promotion can apply to an empty chain, the variable is treated as having type Object?, but since all normal promotion is based on reading the variable, the only interesting case is a late potentially initialized variable:

late var x;
do {
  if (something) x = y;
} while (itsNotInitialized);
// x has empty type chain. Treated as Object?, also for promotion.
if (x is Foo) {
  // x is Foo!
}

The answer to the five questions would then be:

Is it an error to use such a variable when it is or may not be initialized?

It's a normal definitely unassigned variable. You're not allowed to read it until it's definitely assigned, or if it's potentially assigned and late.

If 1) is not an error, then what type does the variable have at the pre-initialization use point?

Object? if the variable is late and potentially unassigned. (Sort of arbitrary, but it's the only way I found to not simply disallow late var x;.)
May such a variable be demoted from its promoted type, and if so, to what type is it demoted?

Depends on what is meant by "demoted".

For assignment, no. So var x; x = 2; x = "a"; is a compile-time error because x="a"; cannot demote x below int.

For join points, yes. The promoted variables may be merged into a less specific type at join points, using the LUB of the initialization types. That handles cases of definite assignment to different types.
Is it an error to initialize such a variable at two unrelated types?

No. They will join to their LUB at join points.
If we choose to make 1) an error, should such an error apply to all variables declared with no initializer, regardless of nullability and type?

No. I'd say yes, but we'd have to re-migrate everything with a declaration like String? x; to String? x = null;. If that's acceptable use of our limited time, then I'm fine with it. (But then, I always preferred writing = null; on my local variables).

It accepts secondTry and thirdTry, but not firstTry. I think I've given up on making that one work.

leafpetersen commented 4 years ago

It accepts secondTry and thirdTry, but not firstTry. I think I've given up on making that one work.

It doesn't immediately seem too hard to support firstTry as well, just by saying that at a join point, if you're joining a definitely unassigned path with a definitely assigned path, you merge in Null to the initialization type.

The appealing thing about supporting this is that it means that:

if you only assign things of type T to x
and you never read x when it is definitely unassigned then you can freely move between T x; and var x;. If one works, the other does.

lrhn commented 4 years ago

If you mean switching between T? x; and var x;, then yes, it works, but not prettily enough for me.

It breaks my definitely assigned story. It would be wrong to read var x; when it's definitely unassigned, but not wrong when it's potentially unassigned. It's (potentially) the same unassigned variable that you read in both cases, so I can't explain why one is OK and the other is not (except that it's a special case for convenience).

For a late variable, it's easy because reading while potentially unassigned is checked for and guarded against.

I think I'd rather add var? x; in the future as a variable which has its type inferred, and is then also made nullable. That will work with var? x = "string"; too. (We can also add var! x; as an unsafely nullable variable which is read as x! anywhere it's used.)

leafpetersen commented 4 years ago

If you mean switching between T? x; and var x;, then yes, it works, but not prettily enough for me.

I actually think that I mean T x; - I think it works out when T is non-nullable too.

It's (potentially) the same unassigned variable that you read in both cases, so I can't explain why one is OK and the other is not (except that it's a special case for convenience).

It makes sense to me. Reading a definitely unassigned variable is stupid code - just replace it with the null literal. Reading a potentially unassigned variable is reasonable - you have two paths, one of which provided a value and another which didn't.

An instance variable with no type or initializer inherits its type from its superinterfaces, if any, and if there are no declarations with that name in the superinterfaces, it just has type Object?. It cannot be affected by promotion or definite assignment analysis. All non-local, non-instance variables with no type or initializer have type Object?. (Including parameters).

This is tempting, but I'm worried about it. We have no mechanism in place for exposing the same API at different types, which means that a non-local variable or field var x in an opted in library will need to be seen as having type Object in non-opted in code which breaks it. Technically we didn't break the downstream library, the migrator did when they decided not to explicitly mark their variable as dynamic but in practice, we'll likely see accidental breakage here.

lrhn commented 4 years ago

I actually think that I mean T x; - I think it works out when T is non-nullable too.

Not the way I understand the idea:

var x;
if (test) x = "a";
print(x);

Here you only read x when it's not definitely unassigned, and you only assign String to it. If the branches are joined to String? and the variable can then be read, then the code works, If you change var x; to String x; then the code is invalid because x is read while potentially unassigned and non-nullable. (Or did you mean that if both type annotations are valid, then you can safely switch between them and they'll mean the same thing?)

It can work, and it might be convenient, but it still feels a little ad-hoc to me. It will help people writing code the way they are used to.

As for the meaning of var x in non-local declarations, I'd really prefer to get rid of implicit dynamics. This is a chance. We are going to migrate the code anyway, and the migration will change var to dynamic. There will be new code, but new code will not break existing clients. There will be bad manual migration. There will always be bad manual migration, I don't think this is the worst offender.

leafpetersen commented 4 years ago

As for the meaning of var x in non-local declarations, I'd really prefer to get rid of implicit dynamics. This is a chance. We are going to migrate the code anyway, and the migration will change var to dynamic.

@munificent suggested an alternative which is to just say that a non-local declaration var x or late var x is an error if a type was not inferred via override or initializer inference.

We could in addition also consider just disallowing late var x or late final x as a local.

leafpetersen commented 4 years ago

Ok, I'm going to put together a concrete proposal based on this for us to discuss, and hopefully move forward on. Initial stake in the ground looks like this. The set of errors is as follows:

Read Errors:

Declaration form	Def Assigned	Not Def. Assigned	Def. Unassigned
var x;	Ok	Error	Error
final x;	Ok	Error	Error
int x;	Ok	Error	Error
int? x;	Ok	Ok	Ok
final int x;	Ok	Error	Error
final int? x;	Ok	Error	Error
late var x;	Ok	Ok	Error
late final x;	Ok	Ok	Error
late T x;	Ok	Ok	Error
late final T x;	Ok	Ok	Error

Write Errors:

Declaration form	Def Assigned	Not Def. UnAssigned	Def. Unassigned
var x;	Ok	Ok	Ok
final x;	Error	Error	Ok
int x;	Ok	Ok	Ok
int? x;	Ok	Ok	Ok
final int x;	Error	Error	Ok
final int? x;	Error	Error	Ok
late var x;	Error	Ok	Ok
late final x;	Error	Ok	Ok
late T x;	Error	Ok	Ok
late final T x;	Error	Ok	Ok

Further:

An initializing assignment is an assignment to a variable which is definitely unassigned.
At join points, we intersect the promotion chain as usual, except that we take the LUB of the "initialization inferred types".
Non-local variables with no inferred type are treated as having type Object?
late local variables are treated as having type Object? if they are read before they are definitely assigned.

lrhn commented 4 years ago

At join points, we intersect the promotion chain as usual, except that we take the LUB of the "initialization inferred types".

(I guess the result of that LUB would again count as an initialization inferred type).

Have to be careful here. For (assume C <: A):

foo(A a, C c) {
  var x;
  if (something) {
    x = a;
    if (a is! C) {
      x = C.fromA(a);
    }
  } else {
    x = c;
  }
  // Joining chains A::C and C.
}

I do want to get C out of this, but that means intersecting an initialization inferred type (IFT) with a non-IFT type.

So, how are the two or more chains with IFTs at the top joined.

Intersection of the entire chains, then add the LUB of the IFTs on top if not already there, or
Intersection of the non-IFT chain, then add LUB of the IFTs on top
Intersection of the chains, then use LUB of the IFTs if the result is empty.

The second option would not get C in the above example.

Checking, it turns out that some of these are not associative.

If we try joining the chains Object'::num::, int':: and double':: (where the ' means an IFT) as both A. (Object'::num:: ⊕ int'::) ⊕ double':: B. Object'::num:: ⊕ (int':: ⊕ double'::)

we get:

1A. (Object'::num:: ⊕ int'::) ⊕ double':: ↦ Object' ⊕ double':: ↦ Object' 1B. Object'::num:: ⊕ (int':: ⊕ double'::) ↦ Object'::num:: ⊕ (num'::) ↦ Object'::num

2A. (Object'::num:: ⊕ int'::) ⊕ double':: ↦ Object' ⊕ double':: ↦ Object' 2B. Object'::num:: ⊕ (int':: ⊕ double'::) ↦ Object'::num:: ⊕ (num'::) ↦ Object'

3A. (Object'::num:: ⊕ int'::) ⊕ double':: ↦ Object' ⊕ double':: ↦ Object' 3B. Object'::num:: ⊕ (int':: ⊕ double'::) ↦ Object'::num:: ⊕ (num'::) ↦ num

So, none of the options are associative. I actually thought one of then would be. ☹️

I think the result for 3 is probably what I want for B, but I want that for A as well then.

What can we do instead?

Not use LUB, obviously. I'd like to use LUB for some situations, but maybe it's just not tenable. We'd do plain intersection of chains, and the result might be empty. What would it mean if the type chain is empty, but the variable is definitely assigned. I guess that the most consistent answer would be Object?.

(When we intersect, would int and int? intersect emptily, or do we make it int? ? What about int and int*?)

leafpetersen commented 4 years ago

So, none of the options are associative. I actually thought one of then would be. ☹️

Option 2 is associative (iff LUB is associative, which I'm not sure it is).

leafpetersen commented 4 years ago

Initial proposal here.

munificent commented 4 years ago

Non-local variables with no inferred type are treated as having type Object?

late local variables are treated as having type Object? if they are read before they are definitely assigned.

I would just make these errors. If we have a principle of "non-nullable by default" where you have to opt in to anything accepting null, that would imply that the language shouldn't silently give you Object?. Is allowing these and inferring Object? (which is the least-useful type since it has the fewer methods) going to do what the user wants often enough to balance out the cases where they didn't realize there was no inferred type and would have preferred an error?

leafpetersen commented 4 years ago

I would just make these errors. I

My proposal does not address the former, I would prefer to address any changes to non-local variables in a follow up.

For late local variables, my proposal boils down to the following:

Unlike non-late variables, late variables don't require initialization on all paths
It is an error to read a late local variables with no declared type if they have not been given a type via initialization on at least one path.

This basically means you can never read a late local variable unless there is an initializing assignment on some path to the read, and the inferred type comes from the possible assignments.

For non-late untyped locals, you can't read them unless there is an initializing assignment on every path, and the inferred type again comes from the initializing assignments.

lrhn commented 4 years ago

I think that's the correct way to handle late variables: Treat them as if they have been assigned at the point where they are read, even though we cannot statically determine that it's true. As such, the type that it has is one of the types that were assigned.

leafpetersen commented 4 years ago

An alternative proposal eliminating the use of upper bounds is here.

leafpetersen commented 4 years ago

Ok, here's where I think we are. I have two proposals up (see previous comments). The primary difference between the two is whether initializations of the same variable at different types result in an error (and hence require a type annotation), or result in an upper bound computation.

Widget build(BuildContext context) {
  var child;
  if (something) {
     child = Text(...);
  } else {
    child = Padding(...);
  }
  return Something(..., child : child);
}

In the upper bound proposal, code like the above is valid. The type of child will be computed as the upper bound of Text and Padding. In the simpler proposal, the code above is an error, since Text and Padding are not the same type.

The main objections to the upper bound proposal boil down (I think) to the following:

Looking at an initializing assignment doesn't tell you the eventual type of the variable, since it may subsequently be demoted at a join point.
There are a lot more points where variables may take on different types, and there is a concern that the result will cause confusion.

The main objection to the simpler proposal is that code like the above seems like it ought to work.

@munificent any thoughts on this?

leafpetersen commented 4 years ago

A variant of the simple proposal which allows nulls to be assigned explicitly and in some cases implicitly is here: https://github.com/dart-lang/language/pull/1009

leafpetersen commented 4 years ago

@stereotype441 @munificent and I discussed these proposal further today. Some notes from the discussion.

As a general principle, we are all on board with the goal that every variable can be thought of as having a single declared type, which is being "discovered" via the inference process. This specifically means that we prefer to specify this such it is an error to assign conflicting types even if the paths on which they are assigned don't converge:

var x;  // Error, no type can be inferred
if (b) {
  x = 3;
  return;
} else {
  x = "hello";
  return;
}  // No valid type on the join, even though neither path reaches here

We were all generally on board with making the presence or absence of inference errors independent of whether the variable was used in a bad state or not. In other words, the previous example should be an error, even though the variable in question is never actually used after the join.

The biggest remaining question is how to deal with mutable variables (var x;) which are intended to be nullable. The three options on the table (which can be combined in some forms) are:

1) Treat such a variable which is unassigned on some path as implicitly null on that path 2) Require a variable to be initialized to null on each path on which it is intended to be null 3) All a variable to be declared with a null initializer and still be inferred to a more precise type.

Examples:

   var x;
   if (b) {
     x = 3;
   } // x has type `int?` in option 1, error in 2 and 3

   var x;
   if (b) {
     x = 3;
  } else {
    x = null;
  } // x has type int? in all three proposals.

  var x = null;
   if (b) {
     x = 3; // Error in option 1 and 2, ok in 3
   } // x has type int? in option 3

  var x;
   if (b) {
     x = null;
     x = 3; // Error in option 1 and 2, ok in 3
   } // x has type int? in option 3

My latest draft of the nullable definite assignment proposal here currently takes option 1, and can easily be made to take option 2 instead.

An informal proposal from @munificent is intended to get to option 3, but requires further work to see if it bears out.

A key issue to work out for option 3 is how to safely handle back edges in loops, given that we do not iterate to a fixed point during flow analysis. Example:

var x = null;
while (b) {
  Null y = x; // Is this valid?  If so, we must guarantee that x is Null on the back edge
  x = 3; // This assignment must either be disallowed, or the back edge assumption must safely approximate it
}

lrhn commented 4 years ago

I think I like option 3 as well. It's explicit, and it makes var x = null; be useful, which it currently isn't.

About

var x = null;
while (b) {
  Null y = x; // Is this valid?  If so, we must guarantee that x is Null on the back edge
  x = 3; // This assignment must either be disallowed, or the back edge assumption must safely approximate it
}

I'd make that invalid. The read of x in the assignment to y happens before it's given a type. The type before the assignment is not Null, it's "untyped with nullable constraint". Reading while untyped is not allowed.

eernstg commented 4 years ago

@leafpetersen wrote:

We were all generally on board with making the presence or absence of inference errors independent of whether the variable was used in a bad state or not.

Indeed, strongly agreeing on that. When we have two non-joining control flow paths where the variable has (very) different types, this still implies that it is a source of serious confusion for the reader of the code that this variable must be understood to have those distinct types at different locations in the code. To me, the ability to read the code is crucial.

The focus on readability comes up with the stated models 1-3 as well:

I'm worried about model 3: It breaks the otherwise consistent rule that an initializer on a local var variable is used to infer its declared type. True, var x = null; is otherwise useless, but I'd prefer to say that developers shouldn't write that, rather than introducing an exception that developers must then have in mind. The point is that we avoid undermining the readability of the code, and the potential technical difficulties with back-edges is just another hint that the readability is worse for a good reason.

I'd be really happy about model 2, where the inferred type needs to be documented: both Null and the type that it is unioned with can be determined from the code (plus an analyzer).

Model 1 would also work, because we as a community have a lot of experience with the rule that a var with no initializer is initialized to null.

But model 1 is still a somewhat quirky rule. In order to distinguish "initialized to null, gets nullable version of the declared type implied by init. assignments" from "gets its declared type from init. assignments", the developer would need to have an overview of all control flow paths and the init. assignments to this variable on those paths: if one or more of them are missing then we go with the "nullable" type.

So that's what the developer needs to understand further down in the code when the (unexpectedly) nullable type causes errors.

@lrhn wrote:

Reading while untyped is not allowed.

Yet another exception for an initializer of the form null (or with the type Null, whatever the rule may be): For any other var with an initializer, the variable is not untyped.

munificent commented 4 years ago

Right, so my preference is that option 3 is best if we can make it work in a relatively straightforward way. I just spent literally twenty seconds poking around the Flutter codebase and found:

  Future<HttpClientRequest> open(
      String method, String host, int port, String path) {
    // ...
    String query = null;
    if (queryStart < fragmentStart) {
      query = path.substring(queryStart + 1, fragmentStart);
      path = path.substring(0, queryStart);
    }
    // ...
  }

Option 3 would let you use var for query and do the right thing. Option 2 would force you to change this to add an else branch. In trivial cases the difference is marginal, but I suspect there are larger nested and chained if statements in the real world where you'd end up having to add several else cases in order to please the type checker (or type annotate the variable, which can be a pain if it has a long type name). Option 3 gives you a nice idiomatic answer.

To handle back edges, how about we simply make it an error if a variable does not have a fully inferred type by the point that it reaches the top of a loop? We don't want to implement an iterative solution and we sure as heck don't want users to have a run a fixpoint algorithm in their heads to figure out the type of a local variable. :)

I look at all of these options as basically dealing with the fact that if (and to a lesser extent switch) are statements instead of expressions in Dart. It just to make initializing variables based on conditional branches easier. Once loops come into play, you're better off being explicit, so I'm fine if back edges are essentially out of scope for this feature.

If option 3 doesn't pan out, option 2 is a reasonable fallback. It may make some code more tedious with either duplicate else branches or type annotations, but is otherwise pretty close. It's a little more verbose, but it's equally as safe as option 3.

I actively dislike option 1 because it undermines our clear story that you always have to explicitly opt in to nullability. I take the "by default" part of NNBD seriously, and I think it's what most of our users want. Option 1 really bothers me because deleting a line of code that initializes a variable in some branch could silently turn that variable into a nullable one without you realizing. In many cases, that will cause an error downstream alerting you to the change, but that's not always true and I don't like relying on that.

@eernstg is right that not doing option 1 is a significant change in idiomatic Dart since we've been telling users for years to rely on the implicitly initialization to null. I wrote that rule. :) But I think Dart with null safety is a different language in some respects and this is a place where I think changing that idiom and our guidance is the right thing to do. Also, over the years, I've had a number of users tell me they don't like the rule and preferred seeing explicit initialization to null even when the language made it unnecessary.

Before Dart 2.0, we told people not to use dynamic as a type annotation when leaving it off would give you the same effect because it was redundant. After Dart 2.0 when we moved to a (mostly) opt-in model for dynamic we changed the guidance to encourage people to be more explicit and almost all users I talked to were happy with the change. This is analogous.

stereotype441 commented 4 years ago

To handle back edges, how about we simply make it an error if a variable does not have a fully inferred type by the point that it reaches the top of a loop? We don't want to implement an iterative solution and we sure as heck don't want users to have a run a fixpoint algorithm in their heads to figure out the type of a local variable. :)

SGTM. The one tweak I might suggest is: it's an error if a variable does not have a fully inferred type by the point that it reaches the top of a loop that assigns to it. We already pre-compute which variables any given loop assigns to for other flow-analysis-related reasons so I don't think this would be too much of a stretch, and it would allow us to accept more programs that are trivial to understand.

I actively dislike option 1 because it undermines our clear story that you always have to explicitly opt in to nullability. I take the "by default" part of NNBD seriously, and I think it's what most of our users want. Option 1 really bothers me because deleting a line of code that initializes a variable in some branch could silently turn that variable into a nullable one without you realizing. In many cases, that will cause an error downstream alerting you to the change, but that's not always true and I don't like relying on that.

munificent commented 4 years ago

it's an error if a variable does not have a fully inferred type by the point that it reaches the top of a loop that assigns to it.

+1!

leafpetersen commented 4 years ago

@lrhn

I think I like option 3 as well. It's explicit, and it makes var x = null; be useful, which it currently isn't.

About
var x = null;
while (b) {
  Null y = x; // Is this valid?  If so, we must guarantee that x is Null on the back edge
  x = 3; // This assignment must either be disallowed, or the back edge assumption must safely approximate it
}
I'd make that invalid. The read of x in the assignment to y happens before it's given a type. The type before the assignment is not Null, it's "untyped with nullable constraint". Reading while untyped is not allowed.

This is inconsistent on the face of it. The whole point of Option 3 (at least as proposed by @munificent ) is that x does have a type before you've written to it, which is why this works:

var x = null;
if (b) {
  use(x); // x has type Null here
} else {
  x = 3;
  x.isEven; // x has type int here
} 
x?.isEven; // x has type `int?` here

I'm not sure if you're envisioning doing something different with loops than from conditionals (and if so what?), or if you are proposing an Option 3a different from the proposal from @munificent in which reading the variable before it has been potentially assigned something typed is an error?

lrhn commented 4 years ago

I'll have to admit that I did not see that var x = Null; would give x the type Null. I read it as keeping the variable "untyped" with a nullability hint.

If x has type Null and is useful at that type, are we getting into problems with promotability again?

var x = null;
var f = () => x;  // Static type Null Function().
x = 2;  // Demotion from Null to int? (a supertype).
Null y = f(); // Looks valid to me.

So, if this is allowed, it's unsafe. The thing that could prevent it is that the closure prevents promotion of the initial assignment.

We should not be allowed to promote to Null if the variable gets read at that type in a closure, and the variable is demoted after the closure is created, which is exactly the situation here. Which means that var x = Null; ends up untyped and () => x has a use of an untyped variable, which we reject.

(If that is true, we might be in the lucky situation that any code which prevents the promotion to Null is also immediately invalid, so all valid code will have a type of Null after the declaration).

So, could the rules be stated as:

A var x declares an untyped variable. It initially has no type. (A typed variable initially has a declared type)
An untyped variable can be given an initialization inferred type, at which point no longer has no type.
Any assignment to a definitely unassigend untyped variable which allows promotion is an initializing assignment. This assignment gives the untyped v ariable an initialization inferred type of T, the static type of the assigned expression.
So is an assignment to an untyped variable whhic allows promotion and which has initialization inferred type Null, and it changes the type to T?, where T is the static type of the assigned expression.
An untyped variable with an initializer, var x = e;, is treated as var x; x = e;.

(It's an error to read a variable with no type. It is an error to assign to a variable with no type unless it's an initializing assingment. IIT Types at merges must be T, T? or Null for one type T, the IIT after is the LUB of those.)

Then:

var x = null;
if (b) x = Foo();
// x has IIT and type Foo?.

leafpetersen commented 4 years ago

Update on this. We've decided to back off of the initialization based promotion due to complexity concerns. Remaining changes are only around definite assignment, specified here

leafpetersen commented 4 years ago

Final design landed here.

leafpetersen commented 4 years ago

Implementation issues

[ ] Analyzer implementation issue here.
[ ] CFE implementation issue here.

leafpetersen commented 4 years ago

Closing this out, the implementation issues are tracked.