Should we restrict cycles of F-bounded types through union types?

dart-lang / language

Design of the Dart language

Other

2.66k stars 205 forks source link

Should we restrict cycles of F-bounded types through union types? #433

Open leafpetersen opened 5 years ago

leafpetersen commented 5 years ago

We don't allow type variable declarations of the form T extends T, or T extends S, S extends T. The analyzer currently accepts type variable declarations of the form T extends FutureOr<T>, but the CFE crashes on such declarations. With NNBD, we will also have the ability to write T extends T? or T extends S, S extends T? unless we specify otherwise.

Should we consider making these an error?

My inclination is yes. In general, this feels to be pushing up against (or crossing) an important boundary in the equational theory that may bite us if we eventually add union types and attempt to re-interpret ? and FutureOr as union types. Note that for example T extends T | Never is essentially another way of writing T extends T, which suggests the kinds of problems that may arise.

I also haven't found any uses for this kind of equation, and since the CFE doesn't accept the FutureOr example, there is almost certainly no existing code that uses this feature.

One way of specifying this is to say that given a set of type variable declarations of the form X0 extends B0, ..., Xn extends Bn, construct a graph with one node for every Xi, and edges from Xi to Xj iff Xi occurs in a top level union in Bj, and say that it is an error if that graph has any cycles. We say that Xi occurs in a top level union in a type T iff:

T is `Xi
T is S? and Xi occurs in a top level union in S
T is FutureOr<S> and Xi occurs in a top level union in S

@eernstg @lrhn @munificent thoughts?

cc @stereotype441

lrhn commented 5 years ago

Since T extends T is satisfied by any type, it's a useless bound. The other ones are similarly useless as a bound because T always occurs directly in the union.

So, there should be no harm in disallowing those cases.

Is there harm in allowing them? That is, can we recognize that these are meaningless bounds that are trivially satisfied by any type, and just ignore them? If we have to detect them anyway to make them errors, it seems equally possible to ignore them.

The safe bet is to disallow. I'll defer to your experience here, so do whatever you find best. It shouldn't affect anybody negatively.

(The moment we get union types, we can write recursive equations with type variables, like this one that will occur alomost immediately: Json extends num|Null|String|bool|List<Json>|Map<String,Json>. I'm not sure whether this is an argument for or against union types :smile:.)

eernstg commented 5 years ago

@leafpetersen wrote:

Should we consider making these an error? .. My inclination is yes.

Any F-bounded type variable bound is a recursive equation on types, so maybe this is no worse than what we already have, and we should allow it for consistency?

Such a type variable bound just works in cases where explicit type arguments are given and must simply be checked (it's easy to check whether T extends F<T> is known to hold for any given T). But they create more difficult situations whenever we want to "find a solution". E.g., during instantiation to bound we can't find the least upper bound of all solutions, so we go a quite different way.

It is actually a tiny inconsistency that we've made T extends T an error; it could be considered an F-bound that happens to have all types as the solution set, which is represented just fine by the upper bound Object?. However, we could justify the error by saying that it is too error prone to allow this: it's most likely a developer mistake to write T extends T in the first place.

So maybe we should simply use the criterion that it is an error to specify a bound B for a type variable X if we can prove X <: B (assuming the bounds in the entire list of type variable declarations) where B is not a top type. This would catch T extends T, T extends T?, T extends FutureOr<T?>, etc, and the justification would be that all these bounds are useless.

(Bounds of the form X extends T where T is a top type are still justified, e.g., X extends dynamic, because they can affect the treatment of expressions of that type in the body, so we need to have that exception.)

The graph-and-cycle based check would certainly flag some of the same cases, and the two approaches might be equivalent.

In any case, my inclination is also yes.

(And I think almost anything would count as a reason for not introducing recursive union types ;-)

leafpetersen commented 4 years ago

This is not planned.

leafpetersen commented 3 years ago

@scheglov points out that allowing these causes obvious termination problems with the subtyping algorithm. Specifically, given (for example) T extends FutureOr<T> and Future<T>, an attempt to prove FutureOr<T> <: Future<T> diverges as follows:

FutureOr<T> <: Future<T> if both // FutureOr left rule
- Future<T> <: Future<T> // True by identity
- T <: Future<T> // Variable bounds rule applies
  - FutureOr<T> <: Future<T> // This is a loop

srawlins commented 1 year ago

I'd think we'd be happy to implement these rules as analyzer warnings. Then Google internal can be clean, and the way would be paved to work on language language to restrict bounds.

eernstg commented 1 week ago

[Edit Oct 16: Using the terminology mentioned in comments below]

Here's a proposal for a compile-time error which could be used to get rid of these bounds. It is probably equivalent to the graph based approach in the original posting, but might be seen as a simpler way to say it:

The union base type of a type of the form T? is the union base type of T. The union base type of a type of the form FutureOr<T> is the union base type of T. The union base type of any other type T is T.

Consider a formal type parameter list <X1 extends B1, .. Xs extends Bs>. Consider X1 extends V1, .. Xs extends Vs where Vj is the union base type of Bj. A compile-time error occurs if the X1 extends V1 ... Xs extends Vs contains a cycle of the form Y1 extends Y2, Y2 extends Y3, ... Yk extends Y1, where Y1 .. Yk is a permutation of a subset of X1 .. Xs.

In other words, "replace the unions by their base type and then check for cycles".

I think it's about time to proceed and make these semi-cyclic bound structures a compile-time error, thus avoiding some infinite-loop failures in tools.

import 'dart:async';

void foo<X extends FutureOr<X>>(X? x) {}
void bar<X extends FutureOr<X>?>(X x) => foo(x);

An extra reason why we'd want to do this now is that 'inference-using-bounds' seems to introduce some new situations where the infinite looping occurs.

@dart-lang/language-team, WDYT?

lrhn commented 1 week ago

Seems reasonable. I don't see any use for a self-referential type variable bound. It's trivially satisfied by any type, so it makes no difference. We can accept it, but that wouldn't help anyone write better code.

About phrasing, I always gets worried when we "erase a union" and choose one of the sides as the canonical one. The other side is equally part of the union, and if we ever get full union type (deity forbid) then we need to be general.

In this case, we're not erasing unions, we're looking for an plain type variable, which means we don't need to look at Null or Future<T>. That's why the other branch is the one to look at.

So maybe:

The direct type variable bound of a type variable with bound B is the direct type variable of B. The direct type variable of a type S is:

If S is T? then it's the direct type variable of T.

If S is FutureOr\<T> then it's the direct type variable of T.

If S is a type variable X, then X.

If S is X & T, then X. (Not needed here, but for completentess)

Otherwise X has no direct type variable.

The type variable bounds of a type variable with bound B is the transitive type variables of B The transitive type variables of a type S is the inductively defined set:

If X is the direct type variable of S, then the type variable bounds of S is the set {X} unioned with the type variable bounds of X.

Otherwise the set is empty.

It's a compile-time error if a type variable is contained in its own type variable bounds.

Could we just define it as:

IncludesDirectly(T, S), meaning that T is a (possibly trivial) union and S is one the elements of the union, which is inductively defined as:

T is S, or

T is R? and IncludesDirectly(R, S) or S is Null

T is FutureOr\<R> and IncludesDirectly(R, S) or S is Future<R>.

(Not shorter, that's for sure. Maybe just use "erase union", but call it something else.)

leafpetersen commented 1 week ago

About phrasing, I always gets worried when we "erase a union" and choose one of the sides as the canonical one. The other side is equally part of the union, and if we ever get full union type (deity forbid) then we need to be general.

I prefer the formulation (and terminology) from @eernstg I think. If and when we need to generalize it, we can worry about that then (I think the graph formulation generalizes fine, for example). I'm ambivalent as to whether to specify it using the graph formulation from my original comment or using the union reduction approach. Fine with either.

lrhn commented 6 days ago

If we call it "union base type" instead of "union reduction", and/or avoid saying "erase union", I'm fine with it too.

It might be enough to just say "reduce the union type" instead of "erase. ..".

This really is just nit-picking phrasing. The approach works.

eernstg commented 6 days ago

Sounds good! I adjusted the terminology.