Allow inferring the name on object patterns

dart-lang / language

Design of the Dart language

Other

2.66k stars 205 forks source link

Allow inferring the name on object patterns #2563

Open munificent opened 2 years ago

munificent commented 2 years ago

An earlier version of the patterns proposal used record pattern syntax like (foo: var f, bar: 3) to match any object, and it would simply call the corresponding getters on whatever static type the matched value had. Alas, this left users with no way to actually test if an object is a record (#2218). We fixed that by saying that record pattern only ever matches record objects. If you want to call getters on other objects, you have to use a named object pattern.

This works, but it's verbose in cases where you want to use an object but the type you're destructuring is the same as the matched value type, like:

var map = {1: 'a', 2: 'b'};
for (var MapEntry(:key, :value) in map.entries) {
  ...
}

Here, we already know the static type of the matched value is MapEntry. And we know that the value can never be an actual record since MapEntry isn't a record type. So the object name doesn't add much value here. It would be nice if you could write:

var map = {1: 'a', 2: 'b'};
for (var (:key, :value) in map.entries) {
  ...
}

Of course, that's ambiguous with record patterns. To disambiguate, I would say:

You can omit the type name in an object pattern only when the matched value type is not a record type, Record, or a supertype of Record. When the matched value is any of those types, then the pattern is instead treated as a record pattern that only matches records.

When the matched value is any other type, a record pattern could never successfully match since the value will never be a record, so we instead treat it as an extractor pattern whose inferred type is the matched value type.

We do have to decide this now because the proposal currently doesn't make it an error to have a record pattern in a context where the matched value type means no record will ever appear. You can still write the record pattern, but it just will never match. When we full specify exhaustiveness checking, we might be able to flag this as an unreachable case. If we do that, then we can allow inferred object names in a later release without it being a breaking change.

lrhn commented 2 years ago

This can apply to both declarations and case-matches.

Case matches are the easier: You know the static type of the expression being switched on (switch (expr) or if (expr case ...)) before looking at the pattern.

At that point, if the static type of the incoming value cannot be a record, then an "unqualified extractor pattern" is interpreted as an extractor pattern on that static. If the incoming value can be a record, then it's either a record type itself, in which case it's trivial, or it's a supertype of Record, which leaves a very limited API to extract from (basically just the Object members). At that point, it's safe to make it a record match. If you want anything else, it's easy to write Object(...) or FutureOr<Object?>(...) instead.

The trickiest one is probably dynamic. If you want to do a completely dynamic match, (x: 2, y: 5), which is matched by any object with an x getter returning 2 and an y getter returning 5, you might have to write dynamic(x: 2, y: 5). I think that's fair, and actually like seeing that this is incredibly dynamic and error prone. (It'll throw if there is no x or y getter, it won't check for existence first).

The assignment is more problematic, var (:x, :y) = e;. The three-step inference algorithm starts by deriving a type schema from the pattern. At this point, the pattern doesn't yet know whether to be a record pattern or an extractor pattern. I guess defaulting to a record type schema will be safe. If it matters to the type inference of the expression, then it might end up being a record. If not, it probably doesn't hurt. Then the expression's static type is used to go back and infer the types for the pattern. At that point, we do the same as above. If the RHS ends up being the class Point, that's what we'll be extracting from. The first type schema was just a hint. (We need to ensure that it's just a hint, and no compile-time error occurs because some expression's static type doesn't match its context type.)

Consider:

class Box<T> {
  T value;
  Box(this.value);
}
void main() {
  var (:double value) = Box(1);
  print(value);
}

Here the type schema of the pattern is ({double value}) (one-named-element record). The type inference for Box(1) does nothing with that. It then creates a Box<int>(1). Then we go back, and since the RHS type cannot be a record, we make it Box<int>(:double value). Then it fails because int, the type of Box<int>.value, is not assignable to double.

There is no good way to pass the expectation that value should be a double to the other side with mentioning Box. So, there will be shortcomings, where you can't omit the type name, probably mostly around generics. (That said, I don't think var Box(:double value) = Box(1); would work either, we don't infer a Box<double> context type from the nested patterns on the LHS, which is what would be needed.)

I think it can work. I'm slightly irked by the "arbitrariness". For records and non-record types, it's fine. It's the supertypes of records where the choice feels arbitrary. (It would be great if we could use {foo: pattern} for extraction instead, but that's also map literal, and any type can implement Map so we can't do something similar with that. Special casing record types is easier because not all types can be supertypes of a record type.)

Also, this all assumes the pattern can be either a record pattern or an extractor pattern. Not all record patterns can, only those entirely with named properties. What will we do for:

var (int x, int y) = new Point(1, 2);  // Author forgot the `:`s

Will the LHS just be a record pattern, which doesn't match? Or will we make it an extractor pattern ... which is then grammatically wrong? (So, basically, how it will break, not whether.)

munificent commented 2 years ago

The first type schema was just a hint. (We need to ensure that it's just a hint, and no compile-time error occurs because some expression's static type doesn't match its context type.)

Yes, the original proposal worked that way. When I talked to folks, some people felt it was strange that the context type might end up being discarded and not actually the same as the eventual incoming type, but as far as I know, it wasn't actually problematic.

I think it can work. I'm slightly irked by the "arbitrariness". For records and non-record types, it's fine. It's the supertypes of records where the choice feels arbitrary.

Yeah, but I think those cases will be rare and when they occur, the record pattern is what you want.

Also, this all assumes the pattern can be either a record pattern or an extractor pattern. Not all record patterns can, only those entirely with named properties. What will we do for:
var (int x, int y) = new Point(1, 2);  // Author forgot the `:`s
Will the LHS just be a record pattern, which doesn't match? Or will we make it an extractor pattern ... which is then grammatically wrong? (So, basically, how it will break, not whether.)

I was thinking that I would specify this by making the name in an extractor pattern optional. So instead of saying a record pattern might behave like an extractor, we'd specify that an extractor might look like a record. Then we'd just disambiguate the colliding syntaxes by looking at the matched value type.

But on further thought, that's probably the wrong way to go about it. Because that means we can't disambiguate the syntax until we've done type analysis.

So... I'm not sure. I'll have to think about how to specify this cleanly once my brain is more COVID-free and functional. But, yes, the above case would end up being an error.

leafpetersen commented 1 year ago

After talking a bit with @mit-mit about #1201 , it occurred to me that allowing this could also make field promotion using if case at least marginally better. Consider:

class LongNamedThing {
   int? longNamedField;
}

void myCode(LongNamedThing thing) {
  if (thing.longNamedThing != null) {
     print(thing.longNamedThing!.isEven);
  }
}

With the patterns feature, you can rewrite this as follows:

void myCode(LongNamedThing thing) {
  if (thing.longNamedThing case var longNamedThing?) {
     print(longNamedThing.isEven);
  }
}

which is marginally better, but still requires saying "longNamedThing" twice. Another option would be to write:

void myCode(LongNamedThing thing) {
  if (thing case LongNamedThing(:var longNamedThing?)) {
     print(longNamedThing.isEven);
  }
}

but now you have to write LongNamedThing

With the proposal in this issue, you could presumably write:

void myCode(LongNamedThing thing) {
  if (thing case (:var longNamedThing?)) {
     print(longNamedThing.isEven);
  }
}

which is at least marginal better.

rrousselGit commented 1 year ago

I think this would go a long way toward making patterns smoother to use.
For now, I feel like the syntax is very heavy. It's tempting to fallback to an if/else-if for the sake of readability.

With regards to conflicts with records, what about changing () to something else, such as maybe #()? Example:

class Foo {
  Bar? get bar;
}
class Bar {
  String name;
}

...

switch (Foo()) {
  case #(bar: #(name: "John"): print('Hello John'); 
}

munificent commented 1 year ago

With regards to conflicts with records, what about changing () to something else, such as maybe #()?

That's an interesting idea. We did spend a bunch of time discussing this syntax and various other ways to make record patterns (and expressions) syntactically distinct. The other place you run into is single-element positional records:

var number = (1);
var recordOfNumber = (1,);

Here, the trailing comma is needed to avoid ambiguity with a parenthesized expression. A different record syntax would avoid that overlap.

But in almost all other widely used languages with tuples and tuple patterns, the syntax is either a parenthesized comma-separated list of fields, or just a comma-separated list of fields. It felt like a large unfamiliarity tax to introduce a different record syntax for Dart given that.

So I think the record syntax works OK. But as you suggest, we could maybe use a different syntax for "inferred object patterns". It's really hard to come up with a syntax that doesn't feel pointlessly unfamiliar, though.

samandmoore commented 1 year ago

I just ran into this today and it was a bummer because the thing I'm destructuring has a multi-word name.

It's really hard to come up with a syntax that doesn't feel pointlessly unfamiliar, though. - @munificent

I totally agree. Personally, I don't hate the #() that @rrousselGit suggested. It's not familiar but it's not entirely surprising.

For another suggestion that might be more intuitive, which is what I think you're looking for, what about borrowing the _ from the wildcard feature for this?

switch (Foo()) {
  case _(bar: _(name: "John"): print('Hello John'); 
}

It may create some grammar ambiguities, or maybe just require some extra read-aheads to parse. That's definitely not my area of expertise, but the _ feels pretty intuitive to me.

Given that I've just been trying to figure out all the flavors of patterns with only cursory glances at the docs, I can say pretty confidently my thought process would likely be:

I want to match this object. (uses record shorthand) ❌
Oh, but it's not a record. 🤔
(types out full class name) 🤔 But that's... long...
Oh! What happens if I throw an _ in there? ✅

rrousselGit commented 1 year ago

It's a bit late, but I kind of wish Records were the ones who had an extra leading character. Like final foo = #(1, 2).
I think it would've been more intuitive overall, by avoiding stuff like var x = (42) vs var x = (42,) which either creates an int or a tuple.

In any case, I would assume a leading character for inferred type matching should be fine. It kind of reminds me of the it proposal:

list.map({ it.name });

munificent commented 1 year ago

It's a bit late, but I kind of wish Records were the ones who had an extra leading character. Like final foo = #(1, 2).

We discussed this, at length. But I think it would be just too weird and unfamiliar given that tuples have extremely well established syntax in so many other languages. We also discussed using #(...) or something similar only for record patterns, but:

It breaks the syntactic symmetry with record expressions.
It makes patterns noisier. A really common, important use case for patterns is doing "parallel" matching where you switch over an immediately-created record to compose a couple of values and then each case has a record to immediately destructure them. That lets you flatten out what would otherwise be nested switches and cover each combination using a single flat list of cases, like:
```
var buttonColor = switch ((isEnabled, isHover)) {
  (false, _) => lightGray, // Disabled buttons are always grayed out.
  (true, false) => blue,
  (true, true) => darkBlue
};
```
This style gets worse if record patterns are syntactically noisier.

Honestly, I think the proposal at the top of this issue is the best, cleanest approach. Simply overload the nice clean record syntax to mean an object pattern when it's obvious from the static type that it can't be a record. All of the existing record patterns keep working as expected, and you get to just simply completely delete the class name in almost all of your object patterns. No additional punctuation needed.

rrousselGit commented 1 year ago

My opinion stems from the fact that I don't find records that useful.
So far, I've never had to use a single record, but I wrote tons of sealed classes and patterns inside switches.

Hence, I don't think records as they are, justify the syntax complexity.
Dart got a bunch of unintuitive syntaxes because of them. And I feel the main use cases are blocked by conflicts with the record syntax.

In a way, it's as if for this feature, Dart picked the equivalent of having optional ; even though everything else doesn't. To me at least.

2. [...]:

    var buttonColor = switch ((isEnabled, isHover)) {
      (false, _) => lightGray, // Disabled buttons are always grayed out.
      (true, false) => blue,
      (true, true) => darkBlue
    };

This style gets worse if record patterns are syntactically noisier.

I wouldn't do this personally.
The usage of a tuple here significantly decreases the readability of the patterns.
To me, the issue is similar to how booleans/numbers/strings should use named parameters instead of positional ones.

I've had to do similar switchs a few times already, but I instead write:

var buttonColor = switch (this /* or a local variable if any */) {
  MyClass(isEnabled: false, isHover: _) => lightGray, // Disabled buttons are always grayed out.
  MyClass(isEnabled: true, isHover: false) => blue,
  MyClass(isEnabled: true, isHover: true) => darkBlue
};

Hence this issue.
So it wouldn't rely on records but on inferred class names.

Honestly, I think the proposal at the top of this issue is the best, cleanest approach. Simply overload the nice clean record syntax to mean an object pattern when it's obvious from the static type that it can't be a record. All of the existing record patterns keep working as expected, and you get to just simply completely delete the class name in almost all of your object patterns. No additional punctuation needed.

Wouldn't it possibly conflict in cases where the type is Object/dynamic?
I would be worried about the same syntax having a different meaning in a different context.

Say we do a refactoring and downcast a variable from MyClass to Object/dynamic. The code could initially be:

class Class {String name};

Class obj = Class();
switch (obj) {
  case (name: 'foo'): print('Hello');
  case _: print('default');
}

But then we do the refactor we have:

class Class {String name};

-Class obj = Class();
+Object obj = Class();
switch (obj) {
  case (name: 'foo'): print('Hello');
  case _: print('default');
}

In that case, instead of a compilation error in the case (name: 'foo'): print('Hello');, the "case" is never reached.
That would be very confusing IMO.

munificent commented 1 year ago

So far, I've never had to use a single record, but I wrote tons of sealed classes and patterns inside switches.

I could be wrong, but I suspect that users will gradually find themselves using them more over time as they become more comfortable thinking in terms of them. We consider it totally natural for a function to accept multiple parameters, but most of us have had decades of habit from languages without multiple returns to expect to have to bundle the returned values up in some meaningful structure or yield the values some other way. As those habits get unlearned, I think we'll see records used more.

Using records to flatten out nested pattern matching is idiomatic in functional languages, though it seems odd when you aren't used to it.

Wouldn't it possibly conflict in cases where the type is Object/dynamic?

That's what the "when it's obvious from the static type that it can't be a record" part means.

Say we do a refactoring and downcast a variable from MyClass to Object/dynamic. The code could initially be:
class Class {String name};

Class obj = Class();
switch (obj) {
  case (name: 'foo'): print('Hello');
  case _: print('default');
}
But then we do the refactor we have:
class Class {String name};

-Class obj = Class();
+Object obj = Class();
switch (obj) {
  case (name: 'foo'): print('Hello');
  case _: print('default');
}
In that case, instead of a compilation error in the case (name: 'foo'): print('Hello');, the "case" is never reached. That would be very confusing IMO.

Good point.

rrousselGit commented 1 year ago

I'm skeptical about this. After all, I'm fairly familiar with Records & co in other languages.
I'm personally a bit worried that Dart lost some of its pragmatism in this 3.0 version. But I'll try to contain my discomfort for now and see if it changes over time.

In any case, what about .() instead of my previous #() proposal?

In a way, this feature reminds me a bit of https://github.com/dart-lang/language/issues/357.
Inside https://github.com/dart-lang/language/issues/357, one of the things I argued is that if we were to allow:

enum MyType {a, b};
void function({MyType? parameter}) {}

function(
  // sugar for "parameter: MyType.a
  parameter: .a,
)

Then we might as well dissociate this from enums and support the shorthand for any type such that we could write:

class Example {
  Example();
  Example.named();

  static final Example empty = Example();
}

void function({Example ? parameter}) {}

function(
  // sugar for "parameter: Example.empty
  parameter: .empty,
)
function(
  // sugar for "parameter: Example.named()
  parameter: .named(),
)
function(
  // sugar for "parameter: Example()
  parameter: .(),
)

And I figured .( would work well here too to express "inferred type"

Example a;
switch (a) {
  case .(value: 42): print('Hello world');
}

In that case, the leading character would become less of an "arbitrary leading character for parsing" and more a convention for "inferred type name"

nex3 commented 1 year ago

I like this idea a lot. I've found myself writing a fair number of object-destructuring macros, and I'm a little concerned that they set me up for a situation where the type of the value being matched changes and instead of a static error the condition just silently fails to match. This would solve that issue, as well as being terser.

munificent commented 1 year ago

they set me up for a situation where the type of the value being matched changes and instead of a static error the condition just silently fails to match.

That's a good point. Currently, there's no way to write a pattern that means "destructure at the current type" without doing some type test as well. Inferred object patterns would give you that.

mraleph commented 1 year ago

In any case, what about .() instead of my previous #() proposal?

I think . can be really confusing here.

What about leaning onto existing syntax: we have already redefined _ to mean whatever, so _ seems like it could be a good candidate for this.

Example a;
switch (a) {
  case _(value: 42): print('Hello world');
}

rrousselGit commented 1 year ago

Maybe. Although I would assume we'd still somehow use . for https://github.com/dart-lang/language/issues/357

It may make sense to use something other than . as we likely want to support https://github.com/dart-lang/language/issues/357 inside patterns.
Such as to write case Enum.value: as case .value

So if it starts with _, it's a pattern. If it starts with a dot, it's a static shorthand.

lrhn commented 1 year ago

What about leaning onto existing syntax: we have already redefined _ to mean whatever, so

That's actually a problem. If _ means "whatever" (and it does), I'd expect case _(foo: pattern) to match whatever the type, basically being an alias for dynamic. I'd read it as "match any type and check its foo getter", just like case _: matches any type, but doesn't check any fields.

The _ doesn't say "infer" to me. But then, nothing does because we usually write nothing to get inference. (Except possibly var, but that looks ugly here, and parses badly.)

If .foo starts to mean "use the context type", then .(:var foo) is somewhat consistent. We're asking the compiler to "infer" the type expected/provided by the context.

It's just not a context type (scheme) here, it's more like the static type of an expression that member invocations are checked against. So it's more like . means "fill in a type here, based on something in the static typing context".

I still think it can work. I'd love to write nothing and allow if (e case (:var foo))..., but it does conflict with record-with-named-field-pattern syntax. Using the type of e to make the distinction is not good for a position where any type of e can be silently accepted. If e becomes Object, then e case (:var foo) will just be a (failing) record type check, with not static warning. Using e case .(:var foo) will warn that Object doesn't have a foo.

natebosch commented 1 year ago

This would be especially handy for working with protos. Those classes tend to have verbose names (and submessages get very long generated names and require aliases to be at all useful), and they are often repetitive with field names.

I think I'd find it easier to figure out the intent of the code without all the noise of the type names

  if (someProtoTypeName
      case .(anotherLongName: .(protoLongName: .(:var interestingField)))
      when interestingField.isNotEmpty) {
    doSomethingWith(interestingField);
  }

  if (someProtoTypeName
      case SomeProtoTypeName(
        anotherLongName: AnotherLongName(
          protoLongName: ProtoLongName(:var interestingField)
        )
      ) when interestingField.isNotEmpty) {
    doSomethingWith(interestingField);
  }

natebosch commented 1 year ago

It would be convenient to also have a syntax that works when the static type is nullable - otherwise the type name suddenly comes back for nullable fields. Maybe a ? after the closing ) to match varName? - if (someNullable case .(:var someField)?) , or some change to the prefix before the opening ( - if (someNullable case .?(:var someField))

Edit: Or maybe this isn't necessary and we can use the same syntax even if the static type is nullable - checking for an object with an inferred type can be assumed to check for non-null.

rrousselGit commented 1 year ago

I agree. A way to have var case Type(:final value) behave as final var?.value would be great.

tatumizer commented 1 week ago

I think the option of using <> mentioned in #4124 is worth exploring. Yes, it clashes with type parameter syntax <T>, but in practice, it shouldn't cause too much trouble.

var map = {1: 'a', 2: 'b'};
for (var <>(:key, :value) in map.entries) {
  ...
}

Another variant discussed there is a dot:

var map = {1: 'a', 2: 'b'};
for (var .(:key, :value) in map.entries) {
  ...
}

One of the objections is that the dot is not visible enough to immediately convey the notion of a missing (inferred) type. If this is true, then "nothing at all" kind-of-prefix (proposed by OP) is even less visible (= totally invisible).

The convention has far-reaching consequences for other contexts - e.g. for default constructor calls, as discussed in #357,

hydro63 commented 1 week ago

My main argument for the <>() syntax is to convey the notion of type, that would be inferred. It's based on the convention of syntax for typing being <T>, so i think it is relatively straightforward to guess what function the pattern <>() serves, when you first encounter it. It also has the benefit of being very visible and easy to spot.

That being said, the syntax is more of an experiment to see which syntax could possibly work, not really something i push for.

Edit: i explained it badly, the <> should serve as a visual hint for the developer, that's why i said it's based on the <T> syntax, not that it is a <T> syntax.

tatumizer commented 1 week ago

My main argument for the <>() syntax is to convey the notion of type, that would be inferred

The term "inferred" may have a broad meaning. E.g. when you write var list = [1, 2], the type of the list is also inferred, but it's certainly not the meaning we are after, Another example: In the expression var x = <>(1), using a broad definition, <> would stand for int, which is not the kind of inference we have in mind (this expression should be treated as an error).

We have to distinguish between "inferred from its content" ("from the inside") and "inferred from the wider context" ("from the outside"), so <>(...) would correspond to X(...), which is a constructor call for the type X, where X is inferred from the wider context.

This convention would allow the shortcut otherwise known as .new(...) to be written as <>(...). Introducing the convention only for pattern matching would be wasteful IMO - the use cases are rare. I don't know whether these two different use cases can conflict or not.

We must treat the whole thing <>(...) as a special syntax unrelated to parametrized types. If we had more symbols in the language, we could find a unique one for this purpose, but for some reason, this is not a mainstream idea yet (Julia language is the exception - see the examples here).

(Maybe <> should be called "(contextual) default type" rather than "inferred type"?

mateusfccp commented 1 week ago

My main argument for the <>() syntax is to convey the notion of type, that would be inferred. It's based on the convention of syntax for typing being <T>, so i think it is relatively straightforward to guess what function it the pattern <>() serves, when you first encounter it. It also has the benefit of being very visible and easy to spot.

That being said, the syntax is more of an experiment to see which syntax could possibly work, not really something i push for.

IMO <> has the semantics of type only in the parameter-argument context. <T> is always either a type parameter or a type argument. It doesn't convey the general semantics of a type.

tatumizer commented 1 week ago

@mateusfccp : treat <> as a single symbol. This "symbol" currently has no meaning, so it's available. The situation where the symbols mean different things in different contexts is not unusual. The meaning of {...} depends on what is inside: it can be a block or a set. The underscore might be a wildcard, or just an identifier. Question mark most often has something to do with nulls, but not always. Etc.