dart-lang / language

Design of the Dart language
Other
2.66k stars 205 forks source link

Do we need `Any`? What's the breakage if `Object?` becomes a non-top type? #3265

Closed eernstg closed 1 year ago

eernstg commented 1 year ago

We currently have some anomalies around the top types: The types void and dynamic are variants of a nameless top type with special static treatment, and we use Object? as a construct that denotes the "plain" top type.

However, we can't really say that Object? "declares" operator ==, toString, hashCode, runtimeType, and noSuchMethod, which means that it is a matter of weasel words and magic that Null has such members as well, and we trust an expression of type Object? to be able to do toString() etc.

We should introduce an explicitly named top type, say Any, in order to normalize this part of the language.

The question is then: Is Object? just a specialized notation for Any? The difference suddenly matters when we have potentially nullable extension types because they should be subtypes of the top type, but if we make them subtypes of Object? and do not say that Object? is a special notation that means Any then we do say that Object? is a supertype of everything which is a subtype of Object and everything which is a subtype of Null, and then it is highly inconsistent to say "by the way, it's also a direct supertype of some extension types". That's not the way types of the form T? otherwise work.

I'd recommend that we do introduce a plain top type, perhaps named Any, and use that to clean up the specification. The default upper bound of a type variable would then be Any rather than Object?, and we probably need to make similar changes in several other locations.

Next, how much finicky work do we need to do, and keep doing, in order to maintain that Object? and Any is the same type, no ifs and buts? Is it manageable? Is it useful?

Alternatively, what are the pros and cons of saying that the extension types are subtypes of Any, but not Object?? (In other words, we're then saying there is nothing special about Object?, it's treated exactly like T? where T is Object.)

We would then have the property that potentially nullable extension types cannot be used in a number of locations where developers have explicitly specified the type Object?, with the intention that it means the top type. Is that a pro or a con?

[Edit] The discussion below gave rise to a proposal that we adopt the following superinterface hierarchy:

graph BT;
  Null --> Any
  Object --> Any

In addition to that, the following subtype relationships are proposed (where everything not involving Any is a rule that we already have today):

graph BT;
  Any[Any, dynamic, void] --> Object?
  Object? --> Any[Any, dynamic, void]
  T[any type T] --> Any

@dart-lang/language-team, WDYT?

eernstg commented 1 year ago

The feature specification of extension types currently treats Object? as a proper class which can be a superinterface. This will need to be adjusted if we introduce Any and make Object? a type which is distinct from Any.

lrhn commented 1 year ago

It's not just extension types which introduce subtypes of top types that neither subtypes of Object nor Null, we have thay already with type variables like <X extends Object?>. The static type X inside the declaration's scope is a subtype of Object? and other top types, and of itself, and nothing else.

As for Any and Object?, I think we do need to keep every type assignable to Object?. But maybe the only finicky thing we need to do is to declare Object? a top type, by fiat. The type Object? is still the union type Object|Null, the least upper bound of Object and Null, which means a subtype of their existing proper supertype Any. We just grab Object? and shove it into the rather crowded set of top-types. We do that by declaring that for any type T, T \<: Object? as an axiom, and Top(Object?) is true, which is basically what we already define today. Then we also say the same things for Any, which is just yet-another-top-type. (Which we'll have to fit into MoreTop, in some way, above or below Object?).

The advantage here is then that Any happens to also be the sole nominative top-type, and only top-type to declare members, which every type must then implement.

We'll make sure to declare that all objects are instances of a subtype of Object? (and all non-null objects are instances of a subtype of Object), so for exhaustion, any top type can be treated as Object?. I guess we already do that, since it works.

There is no distinction between Any and Object? wrt. subtyping or member signatures, neither statically or dynamically. Unlike dynamic and void, neither has special static behavior. You're not allowed to explicitly declare that you implement Any, you have to implement Object to get that. The only way to distinguish Any and Object? is to do toString or == on their Type objects.

The alternative of saying that the type Object? is the type Any might also be an option. It might require us to finally formalize which entities exist in our type system, and then we just say that the type entity of Object? and of Any is the same type. I think it'll end up giving a lot of problems. Is Any a union type? Is Object?? Which type is Object??? Is it just doing Norm eagerly in a few cases?

eernstg commented 1 year ago

I'm not so worried about type variables with no declared bound: It is specified which type is used as the implicitly declared bound (currently it's Object?), and we can just specify whichever type we want for that purpose.

In particular, if we introduce Any and treat Object? just like T? where T is Object (that means: eliminate the anomaly) then we'd probably change this to extends Any. In any case, there is no need for special exceptions regarding type variables.

The advantage here is then that Any happens to also be the sole nominative top-type, and only top-type to declare members, which every type must then implement.

That's what I mean by getting rid of the anomaly. So that's one option.

But maybe the only finicky thing we need to do is to declare Object? a top type, by fiat.

That's what I mean when I say that we turn Object? into a special notation: It has nothing to do with T? where T is any other type than Object, it is just a piece of syntax that denotes the top type, with no special static analysis (unlike void and dynamic), and with a separate set of rules. That's the other option.

None of these options is breaking, because it is still true that all pre-extension-type types are subtypes of Object?, but it makes a difference for new code:

lrhn commented 1 year ago

But maybe the only finicky thing we need to do is to declare Object? a top type, by fiat.

That's what I mean when I say that we turn Object? into a special notation: It has nothing to do with T? where T is any other type than Object, it is just a piece of syntax that denotes the top type,

That's not what I'm suggesting. I'd still treat Object? as T? where T is Object, a common, least, supertype of Object and Null.

Then I'd also add the axiom to the subtype relation that T <: Object? for any T. That doesn't change what Object? is, but it changes how it relates to top types, by making it one (extending the subtype equivalence class of top types to include the equivalence class of Object?, which would otherwise only be equivalent to itself and repeated applications of T? and FutureOr<T>).

We'd then have four basic top types: void, dynamic, Any and Object?, instead of three (and then all the applications of T? and FutureOr<T> to those, which create new supertypes, which are also subtypes by decree). That's why we'd have account for both Any and Object? in the MoreTop function.

So it'd be a distinct type from Any, a subtype of it (because a union type is a subtype of any supertype of both its part types, and Any is a supertype of both Object and Null, and because T <: Any is true for any T) and a supertype of it (because T <: Object? is true for any T, including Any).

It doesn't matter whether we change the default bound on type variables, because they have the same subtypes, the same member signatures, and the same member access and usage restrictions (unlike void and dynamic).

That's also a good argument for not needing two distinct types, which is why we've gotten away with not having Any so far. The reason for introducing Any is to introduce a nominative type carrying the member signatures that all types share (including dynamic to some extent).

And it would indeed not be possible to specify that a type parameter cannot be an extension type. Which is also not something you should ever do, because it's none of your damn business which view a client of your API uses on the objects you operate on. You don't need to know, and you won't ever know. If your generic class/function works with Object?, it works with all objects, generically. You shouldn't care which view a user decides to use on those objects, and you'll only see the runtime type argument, which won't be an extension type anyway.

Alternatively, we could just specify a completely abstract and unanchored set of "Dart object default members" which have the names and signatures of the current Object/Null members, and say that all Dart types have, and must have, those members, or members which are valid overrides of those. Without introducing the Any type/interface to carry them. It's special-casing, but it's worked so far. We'll be able to avoid saying "the members of Object".

eernstg commented 1 year ago

Forgive me, but I'll have to disagree on several points.

Then I'd also add the axiom to the subtype relation that T <: Object? for any T.

Hence, Object? is not treated as T? where T is Object, it is a special exception.

We'd then have four basic top types: void, dynamic, Any and Object?, instead of three

I don't see the purpose. How should Any and Object? be different , and when would that be useful?

I think it makes sense to say that Any is the top type, dynamic and void are the top type with special static analysis, and Object? is a special notational exception that denotes Any.

It also makes sense to say that Any is the top type, dynamic and void are the top type with special static analysis, and Object? is T? where T is Object (and this would naturally imply that some extension types are not a subtype of Object?).

It doesn't matter whether we change the default bound on type variables

It certainly does matter if we consider Object? to be T? where T is Object, and extension types by default are subtypes of Any (and then we have non-nullable extension types and extension types with a non-trivial non-extension type superinterface which do have a subtype relationship to Object).

When you're discussing the pros and cons of two proposals you should qualify statements which are only valid assuming that we're ignoring one of the proposals. Otherwise it doesn't amount to much of a comparison.

Alternatively, we could just specify a completely abstract and unanchored set of "Dart object default members"

Again, we can do that, but having a real interface type (Any) which is the top type would be much simpler and more consistent.

lrhn commented 1 year ago

When you're discussing the pros and cons of two proposals you should qualify statements which are only valid assuming that we're ignoring one of the proposals. Otherwise it doesn't amount to much of a comparison.

I'm very likely ignoring both and suggesting a third alternative. So let's try again.

Let's assume we introduce a nominative Any type, which is defined to be a top type, which has a set of instance member signatures that we want all objects to have valid overrides of.

  • Is Object? just a specialized notation for Any?

No. I do not believe that can be made to work, or make sense. Object? is a type which inherently exists in the Dart type system, as the result of applying the _? type constructor to the Object type, and which is not a nominative type.

I do not believe we can design a consistent system where Object? and Any is both a union type and also a nominative type. I do not believe we can design a consistent system where Object? does not exist as a union type.

(Or if we do design it, the added complications will far outweigh any gains from having Any.)

So about:

Next, how much finicky work do we need to do, and keep doing, in order to maintain that Object? and Any is the same type, no ifs and buts? Is it manageable? Is it useful?

It's not manageable, at all. That would drive a stake to the hearth of the assumptions of our type system, that a type only has one representation (well, up to alpha equivalence of type parameters, order of named fields/parameters, etc., all those structural differences inside a single kind of type, but we definitely know whether it's structural or nominative, whether it's T? for some T.).

A rule like:

UP(T1?, T2) = S? where S`` is **UP**(T1,T2`)

would have to decide, upon seeing the type Any/Object? as its first operand, whether the rule applies at all. Whether to treat the type as the nominative type Any or the structural union type Object?. In the latter case, it goes forward with UP(Object, T2), in the former the rule does not apply, and it falls through, possibly to the old nominative LUB.

Maybe we can go through all our rules and introduce Any into them, as a nominative top type, and try to prevent Object? from ever being the result of any type term, but it's not going to be worth the effort.

Alternatively, what are the pros and cons of saying that the extension types are subtypes of Any, but not Object?? (In other words, we're then saying there is nothing special about Object?, it's treated exactly like T? where T is Object.)

That is how we treat the type today, it really is the union type Object?, the union of Object and Null. The type is introduced as an entity in the typ system by applying the type constructor _? to Object. We just happen to have a function, TOP, which is true for T? where OBJECT(T), which includes more types than just Object, but it means that Object? is therefore a top type (in the supertype-of-all-types meaning) because the subtype relation on types is defined in terms of TOP, and states that all types are subtypes of TOP-types. That's as much a property of Object? as it is a property of every other type.

What you're asking here is what would happen if Object? was not a top type, but Any is.

The immediate effects would be:

Should you be allowed to declare classes which extends Any instead of Object? (Probably not, breaks "Object? representes all values".)

I don't think it's worth the potentially massive migration.

So I reject both ideas, and suggest that if we want a nominative Any type, to have a place to hang our default methods, we should make it distinct from Object? and keep Object? as a top type. That is the minimal feature, it should work (we have so many top types already, it's unlikely something breaks by having another), all existing code keeps working, and changing the default bound to Any is a no-op, since it is functionally indistinguishable from Object?.

The only real change is that the depth and UP functions now start at Any instead of Object when finding the depth of a nominative type.

About the comments to what I said:

Hence, Object? is not treated as T? where T is Object, it is a special exception.

It's treated as Object? and it has a special exception for subtyping only. Those are both possible. The types Null? and Never? both exist, the subtype relation is specified so that Null? is also a subtype of Null, not just the supertype that ? implies. That's no more a special case than saying that Object? counts as a top type for subtyping. It's just how subtyping, a relation on types, types which exist independently of that relation, is defined.

It also makes sense to say that Any is the top type,

You keep saying "the top type", but Dart doesn't have a single unique top type. It has an infinte number of top types: Currently dynamic, void, Object? and any number of applications of T? or FutureOr<T> to a top type. Those are all distinct types, and they are all top types. They exist, which is why we have Norm to get rid of them again.

dynamic and void are the top type with special static analysis, and Object? is a special notational exception that denotes Any.

I'm not sure what "notational exception" means.

It suggets that the type Object? does not exist, there is no type entity introduced for it in the type system, by applying the ? type constructor to the Object type, like there is for T? for any other type T. Applying ? to the type Object, anywhere we use that concept, instead gives the type entity for the nominative type Any.

We can probably define that, but our type system is described using algebraic datatype-like terms, where ? is treated as a type constructor, so that applying ? to T always gives the term T?, which we can then later destructure using cases like "If S is T?, then .... Doing something else in one case means that we need to do that thing *everywhere* we apply?to a type. We need to treat?as a type _function_ more than a datatype constructor. (Or, introducing some notion of pre-canonicalization that happens *before* introducing type entities into the type system, and after each application of?`.)

That's probably doable by fixing all occurences of Object? in source code, make type variable substitution react to Object being placed right inside a ?, and having all the places where we create a new type by applying ? to an existing type first check for the target type being Object.

So take FutureOr<Object>? x = ...; if (x is! Future<Object>) print([x].runtimeType);. This promotes x to Object? today. Same for [1 as int?, "a" as String?] which creates an Object? today. Every algorithm we have which creates new types, will need to make sure to not create Object?, and create Any instead when it would have.

Take FutureOr<Object>? again. By today's rules, it's a top type because OBJECT(FutureOr<Object>) is true. Should we remove that rule? We've remove Object? itself from existing, but there is an infinity of equivalent types which are T? where T is equivalent to Object. Are any of those also converted to Any? Most likely we retain the rule (FutureOr<Object>? is a top type) and do not make any changes to the types themselves. Then, if we do is! Future<Object>, we convert the resuling Object? to Any.

We can still apply ? to Any, so Any? will also exist, and be a top type. It's just Object? which doesn't.

I don't think we'd have problems with our other type functions, because Object? is likely handled by being a top type, or a supertype of the other type, before being destructured into Object + ?. Everything should keep working, it's just that Object? is not a type, but also no operation needs to create that type, because it creates Any isntead.

It's actually possible, more than I initially though, that this can work, but we'd have to be very, very careful about not letting a single Object? slip through, anywhere. I don't think it's worth the complexity.

If the goal is to have a source for the API of Object and Null, I'd rather just create that from scratch, not coming from any particular class or type, than to introduce a type for it, if that means messing with Object?.

eernstg commented 1 year ago

That's a very nice analysis, @lrhn! I think we're converging. At least, I changed my mind about letting Any and Object? be distinct types.

Let's assume we introduce a nominative Any type

Sounds good! So Any would be a superinterface of Null and a superinterface of Object. We don't actually have to reveal whether it's extends Any or implements Any. In any case, this ensures that there is a normal interface type where the five members (toString etc.) are declared, and we don't have to have any special exceptions about those members. So there are no special rules about override correctness, about inheritance of implementations or via interfaces, everything just works according to the standard rules that we're using everywhere else already.

This also means that the "least" upper bound algorithm has a uniform structure to work on: No funny exceptions.

Object? is a type which inherently exists in the Dart type system, as the result of applying the _? type constructor to the Object type, and which is not a nominative type.

Yes, at this point I've been convinced that we can maintain this perspective without creating contradictions.

We would then add one more subtype rule: According to the old rules we have for all T that T <: Object?, T <: dynamic, and T <: void. The new rules would say that for all T, T <: Any, T <: dynamic, and T <: void, and then also that Any <: Object?.

This yields the following superinterface graph:

graph BT;
  Null --> Any
  Object --> Any

And in addition to the subtype relationships introduced by superinterface relationships we have the following (where everything not involving Any is a rule that we already have today):

graph BT;
  Any[Any, dynamic, void] --> Object?
  Object? --> Any[Any, dynamic, void]
  T[any type T] --> Any

So every type T is a subtype of every type in the set Any, dynamic, void (it follows that they are subtypes of each other). In particular, Object? is a subtype of the top type cluster (that's just one of the possible values of T). In addition to that, we specify that Any is a subtype of Object?. This effectively makes Object? a member of the top type cluster.

We should probably also specify that NonNull(Any) == Object, because (1) that is sound, and (2) it is very useful, and (3) this allows us to use Any in all those situations where we previously used Object? (and that might be seen as a good habit in the future).

Wdestroier commented 1 year ago

If toString, equals, hashCode and all methods from the Object class are moved to the Any class, will Object become an opaque type? Considering that extension types are a micro-optimization, would a type that allows fields only (a struct) ever exist? In this case, might be better to have an opaque Any type.

  graph TD;
      Any-->dynamic;
      Any-->void;
      dynamic-->Object/Some;
      dynamic-->Null/None;
      void-->Object/Some;
      void-->Null/None;
      Object/Some -->T;
      Object/Some -->T?;
      Null/None-->T?;
eernstg commented 1 year ago

will Object become an opaque type?

What would 'opaque' mean in this context? It usually means something like 'not transparent', but I can't immediately find a way to use that interpretation here.

Object still has the usual 5 members, but they are now introduced into the interface of Object by the superinterface relationship to Any. This shouldn't be observable in Dart programs (analyzer/CFE based tools can see it, and 'dart:mirrors' would be able to reveal the difference inside a program, but otherwise I don't think it can be detected by user code).

One notion of opaque types is "types that we can denote, but we cannot recognize what they stand for". For example, an SML structure (a bit like a Dart library) can define types as aliases of other types, and they are opaque in the sense that they are not equal to their definition outside the structure:

(* Language: SML *)

structure IntNat = struct
  type nat = int; (* This type is opaque *)
  val zero = 0;
  fun succ x = x + 1;
end

val one: IntNat.nat = IntNat.succ IntNat.zero;
val intOne: int = one; (* Compile-time error: IntNat.nat is not assignable to int *)

But I don't think we'll have any consequences which are similar to this notion of opaqueness.

I'm not sure about the subtype (or superinterface?) diagram. We certainly don't want Any to be a proper supertype of void and dynamic, but they must in turn be proper supertypes of Null and Object. So it's confusing that we have the same arrows everywhere. Also T? is never a subtype of Object (with the current rules, or with any proposal that I know).

Wdestroier commented 1 year ago

What would 'opaque' mean in this context?

I mean a marker interface, because it won't have any members. However, Any will be a supertype of Null while Object won't. It's more clear to me now the coexistence usefulness of Any and Object.

We certainly don't want Any to be a proper supertype of void and dynamic.

I thought void and dynamic could be interfaces (implemented by Object and Null) and Any would be the supertype of everything, but that's not possible I guess.

eernstg commented 1 year ago

I thought void and dynamic could be interfaces (implemented by Object and Null)

I prefer to consider void and dynamic to be annotated versions of the top type, because they are treated as such: Whenever we know that something has type dynamic, we'll give it the special treatment (allow arbitrary member accesses, check at run time), and similarly for void, but as soon as we don't know (say, the type is a type variable X whose value at this point in the execution is dynamic or void), we don't do anything. Like this:

@checkMemberInvocationsDynamically
typedef dynamic = Any;

@dontUseTheValueOfAnExpressionWithThisType
typedef void = Any; // OK, that's a syntax error, but let's pretend.

So it might be possible to make them superinterfaces rather than annotated versions of the top type, but it introduces additional expressive power that we would then have to prevent the usage of. We would be able to have instances proving that void and Object? are distinct types, or dynamic and Object?, and we would need to manually enforce that this does not happen. If we fail to do that then it will be unsound to assume that dynamic <: Object? and void <: Object?, but we do want to have those subtype relationships.

eernstg commented 1 year ago

Based on the language team discussions yesterday I'll conclude that (1) Any would indeed make some things simpler and more consistent, but (2) the special treatment of the five members of Object is so deeply ingrained in the implementations that it is a safer bet to handle those five members as a special case. So we'll continue to use weasel words about this topic. ;-)

modulovalue commented 1 year ago

For future reference, I wanted to leave a pointer to: https://github.com/dart-lang/language/issues/2756

That issue claims that having a type that does not have any of the Object members (e.g. toString/==/hashCode...) would be extremely valuable.

I still believe that to be the case. I hope that, should the hierarchy ever be "cleaned up", that a type without the implicit Equality (==) / Stringable (toString) / HashCode (hashCode) / RuntimeInfo (noSuchMethod & runtimeType) behaviors will be considered to be included into the hierarchy.