feature request: support linear types

aran commented 4 months ago

Consider supporting linear types. Wild proposal for a reasonably major direction. Didn't see it discussed here or in the Google Group. Even something simple like an '@noCopy' annotation would be useful.

A few basic points:

Flutter's meta already has @useResult and @immutable.
Big help to FFI users, especially Rust interop
Possible performance benefits, e.g. if some data copies can be avoided. (Or the implementation is simpler if the VM already does that analysis and there's code to reuse)
- More flexibility in SendPort.send :)

Lots of languages have bolted on linear typing or linear-like features. In Dart's case, it might make sense to do at the analyzer level at first and expose piecemeal features driven by the same underlying analysis such as something like @noCopy annotation, etc. Then a real design for the type system for Dart 5 or whatever could subsume those features.

lrhn commented 4 months ago

TL:DR: Interesting thought experiment. I don't see this happening as a Dart language feature. It will be far too intrusive for the type system. So your suggestion of something tracked only in the analyzer might be more viable, even if not as sound.

If linear types are types whose variables are linear variables, meaning that they can/must be read only once, then it's probably something Dart can do. To some extent.

The underlying wish is to avoid aliasing, that a given "linear object" only ever has one reference. (Or at most one reference, if we ever want to GC it.)

Let's start with linear variables. A linear variable is a variable whose value must be used (read) precisely once. (We can make it affine instead, and say it must be read at most once, then you can leave the value to rot if you want to.) That maps somewhat well to Dart's definite-assignment analysis, if we consider a linear variable being read as clearing the variable, making it unassigned. The rules for linear variables are similar to those for non-late final local variables:

Can only be read if definitely assigned.
Can only be assigned if definitely unassigned.
Assigning makes definitely assigned.
But reading makes definitely unassigned.

That suggests a modifier that replaces final, perhaps linear int x;. (We can then consider having a late linear variable, where the check is done at runtime if it's only potentially assigned/unassigned in the static analysis.)

There are some design issues around the "must be read once" rule. It means you cannot return when a linear variable is potentially assigned. But for full consistency, you also cannot throw then, and we definitely cannot prevent that. But you can read and ignore the value, so void kill(linear Object? value) { value; /* Read once */} is a way to ignore the value. If that's not allowed, then it means that we're not restricting reads, we're restricting uses, and just reading and discarding is not a use. So what is a use? Assigning to another (linear) variable, including passing as argument. Calling a method? (Maybe, maybe not. Probably, but a method gets a reference to the object as this, and may not be restricted to how it uses that reference.)

That's probably why you'd want linear types, to make restrictions on all occurrences, not just variables. Linear types which are types where every occurrence is linear (must be used once, aka. never aliased), and all variables of that type are linear.

Let's say we declare linear class Foo {}, which means that every occurrence of Foo must be used precisely (or at most) once. If assigned to a variable, the variable must be linear (or is inferred/promoted to linear from that point and until it's definitely read or overwritten).

That doesn't fit that well into the Dart type system.

Is a subclass of a linear type also linear? Probably yes, but maybe not technically necessary, just weird if it isn't.
Can you up-cast a linear type to Object? (Which definitely is not linear.) That could be a problem.
- If "no", generics won't work. (More below)
- If "yes", then we lose linearity, because the Object typed value can easily have aliased. Unless we type the result as linear Object, which makes linear a type modifier (like ?), and then you can upcast Foo (aka linear Foo, there is no non-linear version of a linear class) to linear Object, and can only assign it to linear variables.

So, most likely, we'll have to have linear T being a type for each type T, with normalization:

Norm(linear T) = let S be Norm(T), then
- linear R if S is linear R,
- S if isLinear(S) (S is a declared subtype of a declared linear type), and
- linear S otherwise,

The type linear T is a super-type of T, and if S \<: T then linear S \<: linear T.

That allows int i = 42; linear int j = i; which doesn't break linearity for the value after it was made linear, but it does allow existing aliases. (But if it's a proper supertype, is linear Object? now the new top type?)
If we chose sub-type instead, then you can do linear int i = 42; int j = i; which looks like a violation of linearity, but then we can also promote the j variable to be linear for the duration where it holds that value. But we can't prevent void foo(int x) { nonLinear(x, x); } linear int i = 42; foo(i);, so that's a no-go.
Or we could make the types unrelated, possible with some assignability, but then we are left with Object? and linear Object? being unrelated "top" types.

I think the proper supertype is the most consistent model, and it means that all existing generics, which have at most Object? as bound, cannot be used with linear types. They should not just be automatically upgraded to linear Object? as bound, because they might not be using their type-parameter-typed values in a linear way.

Then we'll have to write new linear collections. If that even makes sense.

A linear array is a sequence of linear variables, which are cleared when they are read. The API must make it possible to detect whether a position holds a value. That may be impossible if we take the linear property literally. Let's assume a LinearList<T> has list locations of type (linear T)?. Then it may be possible to check whether that value is null, without "using" the linear value. (A == null doesn't call operator==).

More likely we'll need to consider calling methods on a linear type to not count as a use. Or some other allowance for locally keeping multiple references to the same object, as long as at most one of them survives at a point where the reference is passed to someone else.

You can read a linear variable containing a value more than once, as long as all you use it for is calling methods on it. Those methods must then not use this for anything but calling methods. They cannot store this in any way. Or they can store it locally, but not across any invocation that also has a reference to the value.

Now we're getting there:

A linear type is either a declared linear type, a linear modified type, or a composite linear type.
A declared linear type is a class/mixin/enum/extension type with a linear modifier.
- All declared subtypes of a declared linear type must also be declared linear types.
- A class like linear class Foo extends Object implements Comparable<Foo> {} is a subtype of linear Object, but not of Object. It still is a sub-class of Object, which means we need to separate the concepts of subclass and subtype. It implements Comparable<Foo> (where Comparable has bound <T extends linear Object?>) but is only a subtype of linear Comparable<Foo>.
- A non-linear superclass cannot have a redirecting factory constructor which creates a linear subclass. (That wouldn't
  satisfy the return type.)
- Never is not linear.
A linear modified type is a type of the form linear T
- linear T is a proper supertype of T if T is not a linear type,
- and (Norm) equivalent to T if T is a linear type (whether modified or declared).
- Cannot promote away from being linear: linear Object o = something; if (o is /* no linear*/ Object) ... will not promote away being linear. (Wouldn't be safe, because a linear class Foo would satisfy Foo() is Object, because is checks use the instance type, which is "linear-agnostic".)
A composite linear type is a record type where any field type is linear, and T? or FutureOr<T> where T is linear.
- The record is itself linear, even if some fields are not. Reading a linear field uses the record, so should probably be done as a destructuring. Reading non-linear fields doesn't count as a use.
- For the union types, a check of value != null or value is Future<T> will promote the value to linear T (aka T) or linear Future<T>. (Can't treat the value as non-linear, because then someone can create an alias of T? before checking if it's null.)
A potentially linear type is a type which is bounded by a linear type. Includes all actual linear types, and some more.
- For example a type variable <T extends linear Object?>.
- Extension types on potentially linear types must be declared linear types.
A type which has a potentially linear instance variable must be declared linear.
- Might seem too strict, but is probably necessary for soundness.
A local variable with a potentially linear type becomes unassigned when its value is used. A variable's value is used if the value is:
- assigned to another variable,
- passed as argument to a function or constructor,
- made part of another object as a literal (list/map/set/record).
- returned,
- thrown,
- Reading a linear field from a structurally linear record.
- Being destructured by a pattern match or destructuring assignment.
  - Switching on a linear variable is fine, except that the each pattern and associated when clause must at most use the value once.
  - So the pattern must either destructure or bind the value once, it cannot do both, or either of them twice, or it can do nothing with the value (case _:) and use the value once in the when clause.
  - If a non-trivial-use switch case matches, one that uses the value in the pattern or when clause, then the value is considered used. It cannot be referenced in the body, or after the switch. If at least one such case exists, the value cannot be used after the switch.
  - If a when clause uses a linear variable bound by the pattern, that variable is not usable inside the case body. That may be a feature. (We can even make the variable not available, rather than not assigned, so the bindings that come out of a pattern match are the variables bound by the pattern, minus the linear variables that have been used by the when.)
Calling methods on the value does not count as using it.
- Calling methods on a declared linear type works as normal.
- No methods can be called on a linear type which is not (Norm-alized to) a declared linear type. Not even Object methods.
  - Making linear dynamic completely useless, Maybe linear void is really the new top type?
  - And making LinearHashSet<linear Object?> impossible to create, because it can't even call .hashCode and == on the elements.
  - But this is necessary, because we don't actually know what those methods do. They may leak the this pointer.
  - (We can promise that the method implementations of Object are linear-compatible, so a linear class Foo extending Object doesn't have to override all the methods. But we can't say anything for user-written methods of non-declared-linear types.)
Doing as, is or == null/!= null checks do not count as a use or method invocation. It can promote below the linear.
A potentially linear variable must not be captured by a closure.
- Alternatively, if so, the closure is itself a linear value that can only be used (called) once, and the variable's value is captured, not the variable itself. That is: Creating the closure counts as using the variable.
- A local potentially linearly type variable can be read multiple times inside the declaring function, as long as the only uses of the value is to call methods on the value.
- Assigning to a linear variable gives it the new value. (If we have "fully linear" variables, then it can only be assigned to when it has no value, to ensure a value isn't lost. But then we're back at needing a way to throw away values.
Tearing off a member of a linear type counts as creating a closure because of capturing this. Treat it the same.
A member of a linear type, or extension member on a potentially linear type, must treat this as a linear variable.
- Since it's immutable, the value cannot be cleared, so it cannot be used. So only calling members, not assigning anywhere, or passing to other functions, even in closures. That's pretty restrictive, maybe it can be loosened because this is special, but for extensions, this isn't special at all.
- An extension type should treat this and the representation variable as the same variable. If either is linear, the other is at least potentially linear, so that just means that can only invoke members on it, not store it elsewhere or pass it to functions.

The "not passing to functions" annoys me. If a linear variable is passed as argument, we don't know whether the function will hold on to it or not. If it can (which likely means allowing the variable to be captured in a closure), then it's possible to hold on to the value indefinitely. It would be nice to have a way to say that a variable is only valid for the duration of the call, and cannot be captured. I guess the traditional way to do that is to have the function return the value (or another value) and then keep working with that. It's enforced linear by actual linearization. But it doesn't work for passing this, because this is already shared with the caller of the member, so we need to ensure that it isn't aliased at all, and goes away when the method call returns. So we cannot pass this, or any closure containing this to a helper function, which is problematic. Maybe we do need a marker on parameters that is stricter than linear, which means that the value can only be used during the call. It can then be passed to other functions that promise the same, and passing a linear variable to such a function is safe, you automatically get it back when the function returns. (For async functions, when the future completes. Such async functions can probably only be written using async syntax. There is no way we can ensure the life-span of a variable passed through a number of Future.then calls.)

All in all, there might be a possible design that can provide some notion of linearity of values.

It's not going to be simple.
It's not going to be that easy to use. The "not calling any methods" on potentially or modified linear types
It won't play well with existing generics or collections. We may need new collections for linear values, or maybe that simply doesn't make sense. The only collections that work are Queue and Stack, with no introspection, just put/get/push/pop/isEmpty. But then we need both LinearQueue and Queue if we ever want to have a Queue of non-linear things where we can alias the queue. This really is another color of types, where you sometimes need to implement for both.
Maybe we can erase linear at runtime, if all code can be checked to be linear-sound at compile-time. Not sure if LinearStack<linear Object> o = ...; if (o is LinearStack<Object>) ... should be allowed to work. (We can't promote away linear at the top level, but I don't want to define that recursively on types, so it's easier to just not erase the modifier.)

aran commented 4 months ago

Thanks for considering this so thoughtfully. I am convinced the full language feature is of unrealistic scope, but also convinced there's got to be some way for dart-analyzer to help careful programmers & FFI users if not performance.

One note is whether you want to go all the way and allow flexible coupling of linearity to existing types, e.g. linear class Foo, or restrict it to the safe subset of orthogonal to types, and assign linearity only to variables or values, e.g. @useResult @noCopy Foo x;, where the actual underlying type of x according to the analyzer would be something like {classicType: Class('Foo'), linearity: Linearity.Linear}.

I was imagining something closer to the latter because of its suitability for incremental, annotation-driven usage.

Also worth mentioning on Or at most one reference, if we ever want to GC it., there's already a Dart/Flutter pattern that could be elaborated for declaring the end of a value's lifetime, which would be to formalize.dispose(). Something that eventually calls dispose exactly once could be inferred or declared @usesResult, for example.

lrhn commented 4 months ago

Moved back to SDK repo and made an anlyzer issue. (They can then consider whether it should be a lint or not.)

dart-lang / sdk

feature request: support linear types #55657