Feature: Sound use-site variance

This is the tracking issue for introducing a sound mechanism for use-site variance or invariance in Dart.

Background: The original request motivating this feature is #213; the initial proposal for use-site invariance is #229. The related feature known as declaration-site variance was initially proposed in #213, with tracking issue #524.

The text below describes properties of use-site variance which are good candidates for being adopted. Many things can still change, and a full feature specification will be written and used to manage the discussions about the final design.

Variance in Dart Today

As of Dart 2.4 or earlier, every type variable declared for a generic class is considered covariant. The core meaning of this is that a parameterized type C<T2> is a subtype of C<T1> whenever T2 is a subtype of T1. Other subtype rules can then be used to show subtype relationships like List<int> <: Iterable<dynamic> and Map<String, String> <: Map<Object, Object> <: dynamic; in other words.

This type rule is not sound; that is, in order to maintain heap soundness it is necessary to check certain types dynamically. This means that a program with no compile-time errors can fail with a type error at run time.

For instance, with the declaration List<num> xs and some expression e with static type num, it is necessary to check during evaluation of xs.add(e) that the value of e actually has the type which is required by xs: It is possible that it is a List<int> or even a List<Never>, and it would then be a dynamic type error if the value of e is a double, even though the expression had no type errors at compile-time.

Unsound covariance enables many software designs that would be rejected by a traditional sound approach to variance (e.g., as in Java or C#). This allows developers to make a trade-off between more flexible types (e.g., a variable of type List<num> is allowed to refer to a List<int>) in return for accepting the potential dynamic type errors (a List<int> will work safely under the type List<num> in a lot of ways, just not all).

We want to enable a sound typing discipline for variance as well (rejecting more programs, but providing a compile-time guarantee against the run-time type errors described above). This feature is concerned with the provision of support for that. It consists of two elements: Use-site invariance and declaration-site variance.

Use-site Invariance

Use-site invariance can be used by developers who wish to maintain a sound and strict discipline on instances of classes that have unsoundly covariant type parameters.

Syntactically, use-site invariance consists in allowing actual type arguments in type annotations to be annotated with the modifier exactly, which eliminates covariance for that type parameter. For example:

main() {
  bool b = ...;
  List<num> xs = b ? <int>[] : <num>[]; // OK, `xs` is unsoundly covariant.
  List<exactly num> ys = xs; // Requires downcast (with NNBD: must be explicit).
  ys = <num>[]; // OK, statically safe.

  xs.add(3.1); // No compile-time error, checked dynamically, throws.
  ys.add(3.1); // No error, no dynamic check: Statically safe.
}

Here are some core properties of use-site invariance:

A type argument marked exactly yields a subtype. For instance, List<exactly num> is a subtype of List<num>.

Every instance of a generic class is created with a type which is fully exact. That is, if o is an instance of a generic class C taking two type arguments then the dynamic type of o is C<exactly T1, exactly T2> for some types T1 and T2.

At run time, exactness of a type parameter is reified. That is, if t1 is an instance of Type that reifies List<exactly T> and t2 a Type that reifies List<T> then t1 == t2 must evaluate to false. Technically, this distinction is required in order to maintain sound information about type parameter exactness:

main() {
  List<exactly List<exactly num>> xs = [];
  List<List<num>> ys = xs; // OK, an upcast.
  ys.add(<int>[]); // No compile-time error, but dynamic check.
  xs[0].add(3.1); // Statically safe.
}

When ys.add is invoked it is necessary to perform a dynamic check (as always, because ys has an unsoundly covariant static type and add has a non-covariant signature). If the dynamic type of ys does not include the information that its type argument is List<exactly num> (that is, if it's just considered to be List<num>) then the dynamic check would allow the actual argument (whose type is List<int>). But then we would get a type error at run-time for adding 3.1 to xs[0]. We cannot allow that to happen, because xs[0].add(3.1) is a statically type safe expression.

In summary, use-site invariance can be used to pin down type arguments which are otherwise only known by an upper bound, due to unsound covariance. When the type argument is known exactly for a given object, method invocations can be checked and recognized as statically safe, which improves the statically known level of correctness, and allows for improved performance.

Use-site Variance

Use-site variance is a more complex and expressive feature than use-site invariance, but the fundamental ideas are the same. The main difference is that use-site variance allows for quantifying over a set of type arguments for a type parameter which is invariant (that is, it is marked as inout in the class declaration), thus creating a supertype.

class C<inout X> {
  X x;
  C(this.x);
}

C<out num> c = C<int>(42); // OK, because `int <: num`.
var y = c.x; // OK.
void foo() { 
  c.x = 43; // Error, the setter `x=` of `c` cannot be used with covariant receiver.
}

Use-site variance offers a superset of the features offered by use-site invariance, so this mechanism can also be used by developers who wish to maintain a sound and strict discipline on instances of classes that have unsoundly covariant type parameters.

Syntactically, use-site invariance consists in allowing actual type arguments in type annotations to be annotated with the modifiers out, which adds covariance for that type parameter; inout, which removes all kinds of variance; or in, which adds contravariance. For example:

main() {
  bool b = ...;
  List<num> xs = b ? <int>[] : <num>[]; // OK, `xs` is unsoundly covariant.
  List<inout num> ys = xs; // Requires downcast (with NNBD, the cast must be explicit).
  ys = <num>[]; // OK, statically safe.

  xs.add(3.1); // No compile-time error, checked dynamically, throws.
  ys.add(3.1); // No error, no dynamic check: Statically safe.

  // Using class C from the previous example.
  C<in int> c = C<num>(3.0); // OK.
  var y = c.x; // Error, getter `x` cannot be used with contravariant receiver.
}

Here are some core properties of use-site variance:

A type argument marked inout yields a subtype. For instance, List<exactly num> is a subtype of List<num>.

A type argument marked out or in can only be used where it does not contradict the declaration-site variance of the corresponding type parameter. When allowed, it creates a supertype. For instance, C<out num> is a supertype of C<num> where C is declared as shown earlier.

At run time, the use-site variance of a type argument is reified. That is, if t1 is an instance of Type that reifies C<out T> and t2 a Type that reifies C<T> then t1 == t2 must evaluate to false. Technically, this distinction is required in order to maintain soundness; the example given for use-site invariance in the previous section can be reused (using the keyword inout rather than exactly), because use-site variance is a superset of use-site invariance.

In summary, use-site variance can be used to pin down type arguments which are otherwise only known by an upper or lower bound, and they can allow type arguments to vary even though they are declared as inout. When the type argument is known exactly for a given object, method invocations can be checked and recognized as statically safe, which improves the statically known level of correctness, and allows for improved performance.

I love this topic, and I've gotten really passionate about it over the past month. I recently wrote a doc (PDF) proposing, among other things, a solution to this problem. Initially I considered syntax like exactly and inout. Ultimately, though, I arrived at a proposal that handles all these cases robustly and borrows syntax from Java:

// This is redefined to be your C<exactly num>.
C<num> only_num;

// Same as your C<out num>.
C<? extends num> subtypes_of_num;

// Same as your C<in num>.
C<? super num> supertypes_of_num;  

 // You can specify both bounds at once.
C<? super int extends num> between_int_and_num;

// The default upper bound is void.  Today, the default lower bound would be Null.
// Once we have non-nullable types, the default lower bound is a new "empty" type
// with no values.
C<?> anything;

// This is redefined to be equivalent to C<?>.
C any_other_thing;

In the doc, I include a complete and general set of subtyping rules as well as the logic for determining the static typing of methods and member variables contained in generic classes. Some examples of the subtyping include:

C<A> <: C<? extends A>
C<A> <: C<? super A>
if A <: B then C<? extends A> <: C<? extends B>
if A <: B then C<? super B> <: C<? super A>

The cool thing about this approach is that instead of invalidating some methods and member variables, my proposal simply causes their types to be inferred in a way that appropriately restricts their use. For example:

class C<T> {
  T _value;
  C(this._value);
  T getValue() => _value;
  void setValue(T value) => _value = value;
}

num replaceWithZero(C<? super int extends num> wrapper) {
  var oldValue = wrapper.getValue();  // wrapper.getValue : () => num
  wrapper.setValue(0);                // wrapper.setValue : (int) => void
  return oldValue;
}

I'd love your feedback on my proposal under Stronger static typing for implicit casts and generics.

Hello, @jdonovan, I agree that this is an interesting topic! ;-)

I should note that I was involved in adding wildcards to Java. At first we proposed using + and -, just like all the research papers, but we ended up using ? extends and ? super because that was considered more readable.

However, it is also relatively verbose, and it relies on the intuition associated with monotonic functions: C<? extends T> is used in order to indicate that "the actual type argument could be any subtype of T, but we know that the actual object obtained from an expression of static type C<? extends T> could be an instance of any subtype, so the setup indicates that the type argument "co-varies" with the enclosing generic type, that is, we have a monotonic function. Similarly for super bounds.

We have considered using out and in here for covariance respectively contravariance (and combining the constraints and the keywords to get inout for invariance), because it creates an association to the affordances offered to developers who are writing classes, and to developers who are writing method invocations on such classes.

In particular, an out type argument can be used as a return type because returning a value is a movement of data "out of the object", and an in type argument can be used as a parameter type because this represents movement of data "into the object". Finally, an inout type variable can be used everywhere, moving data into as well as out of the object.

Both styles (? extends and out, ? super and in) are widely known, because they are used in Java and C#, but I suspect that in and out are more comprehensible as well as less verbose.

This discussion can go on forever, of course, but at least I've given a hint that we weren't just using out/in/inout by a complete accident.

One more thing I should mention is that Dart, having dynamically enforced covariance as the default, must use some syntactic marker to request declaration-site invariance (and inout seemed relatively natural since the restrictions for out as well as those for in are applicable).

However, this also means that when we use inout as a use-site variance modifier we are able to specify that the type argument at run-time doesn't differ from the statically known one, even with a class which is covariant or contravariant. For example:

class C<out X> {
  // Lots of nice, safe, covariant stuff.
  final X x;
  C(this.x);
  ...
  // And then maybe a small detour into unsafe land.
  List<X> foo() { ... }
}

void main() {
  C<num> c = C<int>(24);
  c.foo().add(3.14); // Throws!
  C<inout num> c2 = C(); // A `C<num>`; `C<int>()` would be a static error.
  c2.foo().add(3.15); // Safe!
}

So the point is that the ability to make a type invariant even in the case where it is not invariant by declaration is useful as long as any navigation (like a.b.c().d[3].e and so on) can bring us from an object with statically checked variance to another one with dynamically checked variance, the ability to restrict the former can make accesses to the latter statically safe.

The use of C<num> as the invariant would not be very convenient in Dart. First, we are considering to add declaration-site variance before use-site variance (or at least not later than), and in this case we would be able to express invariance for the type parameter in the class C itself, and then C<num> would be invariant.

But with class D<out X> {}, D<num> would be allowed to refer to an instance of D<int>.

So with the combination of declaration-site variance and use-site variance, the syntax where there is no variance modifier is not available to mean "use-site invariance".

This combination is motivated by the fact that declaration site variance is a simple mechanism that enables a large number of client code constructs to omit any information about variance, because it's already in place based on the declaration, and it allows the classes that have declaration-site variance modifiers to be designed specifically to fit this choice. For instance, a class whose fields are all final would typically fit nicely with covariance: It is similar to an immutable record (as in a functional language), which means that covariance is sound.

About invalidating members vs. inferring suitably strong member signatures for them: We are aware of this trade-off.

Java made the choice to invalidate members where a covariant type parameter occurs in a position in the signature which isn't covariant, and so on. The basic idea was that the whole wildcard mechanism was compared to an alternative based on declaration-site variance where developers would write several interfaces for each class: One of them with all the "read" signatures (being covariant in its type parameters), another one with the "write" signatures (being contravariant), and then the class would implement both, and possibly add some mixed signatures that don't fit anywhere.

So the idea was that it's more convenient to just compute those interfaces on the fly.

In contrast, Kotlin doesn't filter out any members, but some members may then end up having a parameter type which is the bottom type (so you can't actually call it), or the return type could be a top type (like Any, if I remember correctly).

I've created an algebra that allows us to compute member signatures for the combination of use-site and declaration-site variance (here), and this actually allows us to maintain a similar amount of information, and still maintain that we are filtering the methods according to certain rules.

We haven't reached a point where we know enough to determine which way to go, but filtering and offering members that can't be called based on their type are certainly both viable approaches.

I should mention one issue that makes me prefer the filtering approach: If you tear off a method then you wouldn't notice any problems if the receiver uses type inference, and the method "can't be used": For instance, it could receive an argument of type Never, but nothing stops you from storing such a function in a local variable. It's only later on when you try to call it that you'll discover that this can't be done.

If c has static and dynamic type C<num> then its representation will almost inevitably include a storage location where the result returned by an invocation of the getter c.x is stored. In that sense you are pulling data 'out' of c.

Of course, if the dynamic type is different from the static type then the getter could have some other implementation; so it could return the value of a global variable, or it could do lots of other things that don't immediately fit the intuition about "pulling data out of c". But it is still reasonable to say that the call site c.x gives the impression that the returned result is something that we "pulled out of c".

The constructor is different because it is statically resolved, so we never have a situation where the constructor is invoked via some notion of late binding. Note that the subtype rules for functions allow for parameter types to be more general than statically known, so not even a hypothetical tear-off mechanism for constructors would contradict this.

A more tricky case is abstract class C<out X> { void foo(void f(X x)); } where the covariant position is in a parameter type. But even in this case there is an 'out' element to the semantics: We provide a callback function f, and the receiver will provide some data to that callback, which means that the data is going out of c and into our callback.

It's not hard to infer which declaration-site variances are possible for any given type parameter of a class (that's the only location where we will support d-site variance): If the type variable occurs in an invariant position or both a covariant and a contravariant position then we can only choose inout; if it occurs only in a covariant position then we can choose inout and out, and we choose out because it doesn't impose restrictions on clients that aren't forced; if it occurs only in a contravariant position then we can choose inout and in, and we choose in; if it is completely unused then we can choose anything, so maybe we choose out (we could choose a fourth variant meaning "not used", sometimes spelled as 'bivariant', but we don't plan to have that).

However, it is a breaking change to use statically sound d-site variance in existing classes where the existing client code uses the dynamically checked variance that Dart has always had (with no modifier).

It wouldn't be hard to create tool support for adding covariant on every single type parameter. We have actually considered supporting the modifier covariant meaning 'dynamically checked covariance', which would be in line with the current use of covariant on parameters: That's also about having a dynamic check, and in return being able to use a typing that's otherwise a compile-time error. We would make that an option, and we could make it a requirement at some point in the future, such that all declarations would yield a statically sound program, except the ones that have an explicit covariant. We could then redefine the meaning of having no d-site variance modifier, e.g., it might mean out or it might mean "inferred".

But I think it's useful to have the modifiers, because they will make accidental changes stand out: Your code doesn't compile anymore if you add void foo(X x) to a class where X is a type variable marked out. If the variance is inferred then you'd just have a class whose X is inout and you have to rely on client code that breaks in order to notice that anything went wrong.

The point is that explicit indication of d-site variance is a commitment by the developer to maintain a certain discipline in the declarations in the given class. (It is also documentation for developers who are using the declaration, but for that part we could make an IDE show the inferred values, so from that point of view it doesn't matter so much whether we use manually written modifiers, or we use inference to choose the "best" ones that will work).

I don't quite get the next part. With something like X foo(X x) => x;, you can't say that the parameter has in variance and the result has out variance: The type is X in both cases (presumably that's a type variable declared by the enclosing class), and it follows that the only possible variance for X that won't cause a compile-time error is inout (or "no modifier" which means dynamically checked covariance, which is also allowed in all positions).

In any case, I think the best choice (after a long transition period ;-) would be to require an explicit covariant modifier on dynamically checked covariance (because we tend to require extra ceremony when a developer wishes to use a dynamically checked mechanism, as a kind of warning), and then we might redefine the missing modifier to mean whatever turns out to be most common, or maybe it could be inferred. Maybe. ;-)

This was the day where I finally remembered to look up 'epiphany'. Thanks! ;-)

you just opt in, on module-by-module basis. Right?

With nnbd a software entity (say, a program aka an entry point, or a package, or a library) needs to opt in. The typical case is that it is a package, and it increases its sdk constraint in pubspec.yaml so much that it is required that tools support nnbd. This will magically opt in every library in the package, unless it uses // @dart=2.6 or so to avoid opting in.

When opted in, int means non-nullable int, and nullable int is spelled int?. So this means that unedited code will suddenly be very null-strict (because all types are non-nullable). Then you edit the code until it works (by allowing a few type annotations that really need it to be nullable by adding ?, and by adding late to allow a variable to remain uninitialized, or ! in order to check dynamically that some expression isn't null, and so on). So there's a lot of work in doing this, but the end result should be relatively nicely nullsafe code "by default".

Variance shouldn't be nearly as involved: For declaration-site variance we allow developers to add the modifiers to type parameters in class declarations. Since nothing at all happens before someone does that to some class, it's more like "you work on variance when you are ready, one class at a time", rather than "this feature will hit you everywhere in existing code when you enable nnbd".

Of course, if you change List to be invariant then almost all the code in the world will break, but we get to decide which changes we want to have, and we can try to estimate the breakage for each step. And we aren't going to make list invariant. ;-)

dart-lang / language