require records to have two or more fields?

dart-lang / language

Design of the Dart language

Other

2.67k stars 205 forks source link

require records to have two or more fields? #2125

Closed Hixie closed 2 years ago

Hixie commented 2 years ago

https://github.com/dart-lang/language/blob/master/working/0546-patterns/records-feature-specification.md

A parenthesized expression without a trailing comma is ambiguously either a record or grouping expression.

We could also fix this by requiring that tuples have more than one field. This seems like a simplification with no cost since a record with one field is equivalent in every way other than syntax to just having that field as a variable rather than a record.

jakemac53 commented 2 years ago

The general Record type is useful though and I do expect general purpose things to be built on this. You can for instance get the named and positional "fields" for a record using the public API. Not allowing records to have only one (or possibly even zero?) "fields" means these multi-purpose functions can't work with these simpler data structures.

Hixie commented 2 years ago

Can you elaborate on a use case where it would be useful to support general Records with literals with only one field?

rrousselGit commented 2 years ago

I could see records with a single field useful as inputs of an API. especially generic constraints

For example:

T foo<T extends ({ String name })>(T value) {
  return ({...value, name: 'John'});
}

rrousselGit commented 2 years ago

It'd also be helpful for package authors who want to avoid breaking changes when they know that a future release will add new fields to the record.

lrhn commented 2 years ago

I don't think a record with a single named field is the problem (it doesn't have a syntax conflict with grouping parentheses because of the name).

For (v) being either a one-tuple or a parenthesized value, it would be great if we can make the one-positional-entry record (T) and the plain object type T completely equivalent. Every value is a tuple, some are just trivially one-tuples. (And maybe Null is the zero-tuple! We don't need two unit-types in the language, but that has its own other issues.)

One problem with identifying one-tuples with object references is that Object? becomes a subtype of Record, at which point the Record type itself becomes meaningless. If everything is a record, noting is (well, it means nothing to be a record).

It also generalize to the question of whether ((1)) is different from (1). Maybe we can treat just one-tuples specially, in general, so ((1, 2)) is the same as (1, 2), and still allow nesting other tuples. Either can be useful. I believe the current proposal allows nesting of tuples, so ((1, 2), (3, 4)) is not the same as (1, 2, 3, 4). (Very reasonable, because otherwise boxing/unboxing becomes very visible in the specification, and it is very hard to actually make it work ... if at all possible withing the current language.)

jakemac53 commented 2 years ago

Can you elaborate on a use case where it would be useful to support general Records with literals with only one field?

Consider for instance a pattern for data classes where the copyWith method takes a Record instead of a parameter matching each field type. You could make it such that the signature of the method is just void copyWith(Record data), and then check what fields exist on the actual record given, and write over just those fields. That can now be expressed through a common interface:

abstract class Copyable {
  void copyWith(Record data);
}

You should be able to make classes with only a single field Copyable as well in that case, or even just pass a record with only a single field in order to write over only a single field from the original instance.

Maybe this isn't actually a good API (you lose static safety and autocomplete for the fields of the record), but I could see the appeal as well, and there may be other similar use cases that don't have the same downsides.

munificent commented 2 years ago

Ah, I'm glad you raised this because it's an important design point.

I think of it as deciding whether a tuple represents a concatenation of its elements or a collection of them. If a tuple is just its elements, then it follows that a one-element tuple simply is the element itself. There's no need to have one-element tuple expression syntax because ("element") is already a parenthesized expression which returns the inner value, which is the corresponding "one-element tuple". If a tuple is a container, then a one-element tuple is as meaningful as a one-element list or a one-element set and you want some syntax to produce them.

Likewise, at the type level, if the language treats a tuple type as a concatenation of types then the type (String) is synonymous with String and parentheses work in type annotations the same way they do in expressions. If the language considers tuples to be collections, then (String) is as different from String as List<String> is from String.

Tuples are concatenations in Swift, C#, Standard ML, OCaml, Haskell, and most other ML-derived languages.

Tuples are collection-like in C++, Python, and TypeScript. I would put Kotlin in here too, though it's a little fuzzier. (Python is where I got the idea to use a trailing comma to distinguish a single-element tuple from a grouping expression.)

You'll note that languages with option types tend to not have single-element tuples and languages with nullable types tend to have them. (I don't think that's a coincidence.)

The thought process that led me to this for Dart was basically:

I wanted to unify tuples and records into a single construct. That mirrors parameter lists in Dart, which can contain both positional and named arguments. This mirroring is important to support spreading records into argument lists, which I do want. I also just think it's generally useful.

That means you can create records like:

var record1 = (1, 2, x: 3, y: 4, z: 5);

Here's a single record with a couple of positional fields and a few named ones. You can destructure it like this:

var (a, b, x: x, y: y, z: z) = record1;
print('$a $b $x $y $z'); // "1 2 3 4 5".

I didn't want you to have to destructure all named fields. One of the primary advantages of named fields is that it's clear which ones you're accessing when you destructure. And since it is unambiguous, there's no reason to require a user to destructure all of them if they only want a couple. (This is the same reason why named parameters are easier to mix and match when passing arguments.)

I want to allow:

var (a, b, y: y) = record1;
print('$a $b $y'); // "1 2 4".

In fact, you don't have to destructure any named fields if you don't need them:

var (a, b) = record1;
print('$a $b'); // "1 2".

Supporting arbitrary combinations of positional and named fields implies supporting records with one positional field and some named ones. It would be super weird if a record could have zero positional fields (and some named ones), and two positional fields (with or without named ones), but not one positional field (with or without named ones). Consider:

var record2 = (1, x: 3, y: 4, z: 5);

This should work:

var (a, y: y) = record2;
print('$a $y'); // "1 4".

The behavior should be consistent even if you extract no named fields. Now consider what happens when you combine goals 2 and 3:

var (a) = record2;
print('$a');

Here we've got a record with one positional field and some named fields. We don't care about any of the named fields, just the positional one.

What does this print? If we don't support the notion of one-element tuples, then the (a) pattern is strictly synonymous with a. And the latter is a simple variable pattern that binds the entire matched value. That would imply that this prints "(1, x: 3, y: 4, z: 5)".

If it does that, how can you get the "1" out at all? You're basically stuck.

This led me to conclude that tuples are containers for their fields and that extracting even a single positional field is a meaningful operation.

Here is some related Kotlin code to show that it deals with a similar problem:

val a = Pair(1, 2)
val (b) = Pair(1, 2)
println("a = " + a + ", b = " + b) // "a = (1, 2), b = 1".

Objects and functions, living together

Taking a step back, I generally think of Dart as an object-oriented language. It supports programming in a function style, but the goal is to integrate that harmoniously into the existing object paradigm. I don't want it to feel like two separate languages taped poorly together.

To me, that means modeling functional styled code in terms of the existing object representation. It's why algebraic datatype-style pattern matching in the proposal is based on subtyping (as it is in Scala).

I want pattern matching to work not just with a special blessed set of "functional style values" but with any kind of Dart object where it makes sense. That's why the proposal has record patterrns but models named field destructuring in terms of calling getters on objects of any type. It lets you take all of the existing object-oriented classes in the Dart ecosystem with all of their getters and immediately use them in destructuring patterns. The day this feature ships, users will be able to write:

var map = {'a': 1, 'b': 2};
for ((key: k, value: v) in someMap.entries) {
  print('$k: $v');
}

var (minutes: m, seconds: s) = DateTime.now();

For record patterns with named fields, then, I think the natural model is that the thing you're matching on contains the values you destructure. Since this proposal also unifies positional and named fields, it extends that model to positional fields.

Positional field patterns are also just calls to getters with implicit names (field1, field2, etc.) (I got this approach from Kotlin's component1, component2, etc.).

That in turn implies that tuples are collections and that there's nothing wrong or problematic with a one-positional-element tuple. It's just a class that only has field1.

The only wrinkle is coming up with an expression syntax for create a record with only a single positional element since that collides with using parentheses for grouping. A trailing comma isn't exactly beautiful, but it works. In practice, I think it will be rarely used. It's mostly about being able to support destructuring values that contain one positional element and other stuff.

Hixie commented 2 years ago

I think a lot of that makes a lot of sense, but I'm not sure I agree about #4. Why can't var (a) = record2 grab the first value, without (x,) being an allowed literal? The desugaring syntax and the literal syntax paralleling each other is fine, but they don't necessarily have to be so rigidly identical to each other that we have to contort ourselves to allow a way to make a one-field record literal.

My problem here is that (a) means one thing about a(a,) means another, which means that these four identical-looking statements:

  foo(
    a
  );
  foo(
    a,
  );
  foo((
    a
  ));
  foo((
    a,
  ));

...mean the same thing... except the last one.

Also that I have to explain why the compiler has Opinions about these four cases that aren't obvious (or symmetric):

  var (a) = (a);
  var (a,) = (a,);
  var (a) = (a,);
  var (a,) = (a);

lrhn commented 2 years ago

I'd consider not allowing partial record matches without extra syntax.

That would mean writing something like

var (a, ...) = biggerRecord;

to match part of a bigger record.

By requiring you to be explicit about there being more, we help you noticing if a tuple type changes, or you forgot about something, even though you intended to be exhaustive.

The pattern match should only be allowed when the structure of the RHS pattern is statically known, so it's clear from the context which parts are not matched.

I'd even allow capturing the rest with a "rest pattern"

var (a, x: x, ...p) = (1, 2, 3, x: 4, y: 5, z: 6);
print(p); // (2, 3, y: 5, z: 6)

so ... is really just short for ... _.

I admit I lean heavily towards records as concatenations.

It's the typing of (a) is Record that worries me the most. If we just drop the Record type entirely (so records have no shared supertype except Object?), then it might just make things easier. You simply cannot abstract over records with different structures, because they share nothing, not even structure. All you can do is box them into Object, except the one tuple (null) which is not an Object.

(I'd also consider a list matcher like [var x, var y] different from [var x, var y, ...], the first being equivalent to List(length: 2, [1]: var x, [2]: var y) and the second to List(length: var $ if ($ >= 2), [1]: var x, [2]: var y), or whichever syntax allows specifying something like that).

Hixie commented 2 years ago

By requiring you to be explicit about there being more, we help you noticing if a tuple type changes, or you forgot about something, even though you intended to be exhaustive.

Oh that's a really good point, yes.

munificent commented 2 years ago

I think a lot of that makes a lot of sense, but I'm not sure I agree about #4. Why can't var (a) = record2 grab the first value, without (x,) being an allowed literal? The desugaring syntax and the literal syntax paralleling each other is fine, but they don't necessarily have to be so rigidly identical to each other that we have to contort ourselves to allow a way to make a one-field record literal.

Good point!

We could do that and say that, yes, one-element positional destructuring patterns exist, which implies that one-positional-element record values exist too, but the latter simply don't have a literal syntax. You're right that using a trailing comma for tuple literals is kind of confusing in a language that already allows trailing commas in argument lists. Sort of like how Dart didn't have set literals for many years even though it had set objects.

We'd still want at least some API to create them, though. One place this comes up is user-defined extractors. This is something I do want to support. If we have that, then most extractors will return a record representing the extracted fields. I expect it to be common—perhaps the most common—that the extractor only destructures a single value. (In other words, it behaves like a conversion pattern.) For example, imagine something like:

(int)? parseInt(String s) =>
  // Try to parse [s] to int, return value in record on success or `null` on failure.

(bool)? parseBool(String s) =>
  // Try to parse [s] to bool, return value in record on success or `null` on failure.

(bool)? parseBool(String s) =>
  // Try to parse [s] to bool, return value in record on success or `null` on failure.

describe(String s) {
  switch (s) {
    case parseInt(n) => print('integer $n');
    case parseBool(b) => print('boolean $b');
    case _ => print('other $s');
  }
}

The return type of parseInt() and parseBool() is nullable to indicate match failure. It returns a record containing the destructured value (instead of just the bare value) so that it's possible for an extractor to express "successfully matched and destructured null".

So in the body of these extractor functions, they'll need a way to create a positional tuple with a single field. That could be as simple as a constructor on Record or something. I do think having a literal syntax for single-field tuples would be nice, but it's not essential.

munificent commented 2 years ago

I'd consider not allowing partial record matches without extra syntax.

That would mean writing something like
var (a, ...) = biggerRecord;
to match part of a bigger record.

For positional fields, yes. The proposal currently requires you to match them all. You can't silently discard them, just like you can't silently pass unused positional arguments to a function. There is also a TODO in the proposal to support a ... syntax like you have here to explicitly opt in to discarding some positional fields.

But for named fields, I don't want you to have to match them all. Since positional fields are just getter calls on arbitrary objects, "all" could be a potentially large and unwieldy set. We definitely don't want users to have to match hashCode on every object. :)

By requiring you to be explicit about there being more, we help you noticing if a tuple type changes, or you forgot about something, even though you intended to be exhaustive.

+1. In particular, if the thing you're destructuring changes its API by inserting a new positional field in the middle, we wouldn't want to have existing patterns continue to compile but now silently change which fields they are destructuring.

I'd even allow capturing the rest with a "rest pattern"
var (a, x: x, ...p) = (1, 2, 3, x: 4, y: 5, z: 6);
print(p); // (2, 3, y: 5, z: 6)

I'm interested in this too, though what the "rest" means with named fields where the RHS isn't literally a record type could get weird. I'd want to see use cases before we dig into this.

It's the typing of (a) is Record that worries me the most. If we just drop the Record type entirely (so records have no shared supertype except Object?), then it might just make things easier. You simply cannot abstract over records with different structures, because they share nothing, not even structure. All you can do is box them into Object, except the one tuple (null) which is not an Object.

I'm not super attached to Record, but I expect it would be marginally useful in the way that Enum and Function are useful in Dart. It gives you a way to express a slightly more meaningful type for APIs that then in the body enumerate over a hand-picked set of types, like:

num magnitude(Record r) => // <-- "Record" here.
  switch (r) {
    case (x) => x;
    case (x, y) => sqrt(x * x + y * y);
    case (x, y, z) => sqrt(x * x + y * y + z * z);
    case (x, y, z, w) => sqrt(x * x + y * y + z * z + w * w);
  };

Code like this isn't great, but use cases like it come up enough that I think it can be helpful. Also, it's a potentially useful target for extension methods.

(I'd also consider a list matcher like [var x, var y] different from [var x, var y, ...], the first being equivalent to List(length: 2, [1]: var x, [2]: var y) and the second to List(length: var $ if ($ >= 2), [1]: var x, [2]: var y), or whichever syntax allows specifying something like that).

+1. The proposal states that now and has a TODO for ....

rrousselGit commented 2 years ago

That would mean writing something like
var (a, ...) = biggerRecord;
to match part of a bigger record.

By requiring you to be explicit about there being more, we help you noticing if a tuple type changes, or you forgot about something, even though you intended to be exhaustive.

I don't like this idea, at least not in the given example

Here this isn't a "pattern match", but a variable declaration.

Writing:

var {a} = value;

should be nothing but syntax sugar for:

var a = value.a

avoiding the repetition of "a". Whether the object has more properties or not doesn't matter.

There's no exhaustiveness involved here, since there's no matching done.

I'd personally expect it to work like in Javascript, so that we'd be able to do:

final record = (1, 2, a: 3, b: 4) // or whatever the syntax is

final (one) = record
final (_, two) = record
final {a} = record
final {b} = record

final (one, two, {a, b}) = record

rrousselGit commented 2 years ago

Maybe it's worth explaining why an exhaustive destructuring would be needed

If this was about supporting things like:

switch (record)
  case (42,...):

Then I'd understand

But for a "var x = record", I don't see the value added.

Hixie commented 2 years ago

describe(String s) {
  switch (s) {
    case parseInt(n) => print('integer $n');
    case parseBool(b) => print('boolean $b');
    case _ => print('other $s');
  }
}

I like the feature in principle, but syntax-wise, I can't tell if this is defining a function or calling a function, and the idea that it might instead be declaring a variable and implicitly calling a function and the things that looks like parameters are in fact return values of a sort is not something that fills me with happiness.

lrhn commented 2 years ago

If the syntax required you to write var in front of any variable introduction, so an actual binding pattern would be:

describe(String s) {
  switch (s) {
    case parseInt(var n) => print('integer $n');
    case parseBool(var b) => print('boolean $b');
    case _ => print('other $s');
  }
}

then the syntactic symmetry would be broken too. (Or, in this case, there wouldn't be a var because it is actually a function call).

lrhn commented 2 years ago

@munificent

But for named fields, I don't want you to have to match them all. Since positional fields are just getter calls on arbitrary objects, "all" could be a potentially large and unwieldy set. We definitely don't want users to have to match hashCode on every object. :)

That sounds like we are treating "normal" objects and tuples the same way. I probably wouldn't do that. Named record elements are not getters. I wouldn't expect ((x: 42) as dynamic).x to work. (I would allow (x:42).x to work, but it's similar syntax for a different operation, which is basically let (x: var tmp) = (x:42) in tmp).

Accessing members of an object is always optional. The object is defined in terms of its identity, its state, and its behavior. Whether you access the getters or not.

A record/tuple is only defined in terms of its contents. It's a product type. The only thing you can do is destructure (in whichever way) to project values out of the product type. I wouldn't mind requiring record patterns to be exhaustive, and then provide a ... as an option to explicitly allow the pattern to ignore some parts of the record.

We don't have subtyping between (num, num) and (num, num, color: Color), so if a static type changed from one to the other, I wouldn't mind getting an error at a (num x, num y) = suddenColorPoint;. That seems like a service.

(I'm sure we can find use-cases for allowing it though. The question is how dangerous those use-cases are, and whether people would be happier with having a physical reminder, ..., that a match is partial.)

munificent commented 2 years ago

If the syntax required you to write var in front of any variable introduction, so an actual binding pattern would be:
describe(String s) {
  switch (s) {
    case parseInt(var n) => print('integer $n');
    case parseBool(var b) => print('boolean $b');
    case _ => print('other $s');
  }
}
then the syntactic symmetry would be broken too. (Or, in this case, there wouldn't be a var because it is actually a function call).

Oops, yes, that is in fact the correct syntax here.

But note that if we supported user-defined irrefutable extractors (which I would like), then for variable declarations, it would look like:

var (parseInt(x), parseInt(y)) = ("123", "345");
print(x + y); // "468".

I agree that if you aren't used to the notion of patterns, it can be confusing when what looks like an expression is sort of the inverse. But that property is intrinsic to the concept of pattern matching in all languages:

var a = 1;
var b = 2;
var c = 3;

var [d, e, f] = [a, b, c];
var (g, h, i) = (a, b, c);
var {5: j, 6: k} = {5: a, 6: b};
var Point(x, y) = Point(a, b);

Patterns always mirror the expression syntax for the kind of thing they destructure. It's weird at first but once that clicks then it becomes an intuitive way to understand how the destructuring behaves. "Ah, this pattern looks like a list literal, I bet it accesses elements like you would from a list. This pattern looks like a map literal, I bet it accesses elements like you would from a map. This looks like a constructor, I bet it pulls out the fields that the constructor initializes."

lrhn commented 2 years ago

I admit I have a very hard time reading

var (parseInt(x), parseInt(y)) = ("123", "345");
print(x + y); // "468".

and it's not because I am completely unused to patterns. It just looks backwards, precisely because parseInt looks like a function call, but x is actually the return value. And I get that patterns mirror the construction syntax, this one might just be taking it a bit too far for my taste.

Would actual mirroring be

var ("$x", "$y") = ("123", "345");
print(x + y); // "468".

It won't work, obviosuly, because type-agnostic conversion to a string is not a reversible operation. But that means that parseInt is not the mirror of an actual int parseInt(String) function, it is such a function itself, it's just being called in a backward way. And that's weird.

Even as

var (parseInt(var x), parseInt(var y)) = ("123", "345");
print(x + y); // "468".

I find it hard to understand the data flow (but it is slightly better than the non-var version).

Maybe

  (var x, var y) = ("123", "345").map(int.parse); // Applies int.parse to every element of tuple.

or just use binding property matchers

 ({parseInt(): var x}, {parseInt(): var y}) = ("123", "345");

where parseInt is an extension method.

Or allow arbitrary expressions containing it as property extractions, instead of only selectors:

 ({int.parse(it): var x}, {int.parse(it): var y}) = ("123", "345");

Levi-Lesches commented 2 years ago

var (parseInt(x), parseInt(y)) = ("123", "345");
print(x + y); // "468".

I'm confused as to what the relationship between the string literals, the parsed integers, and the variables are here, and I think that may be leading to the ambiguities. From the print statement, I see the above as:

final xString = "123";
final yString = "345";
final x = int.parse(xString);
final y = int.parse(yString);

So why in the syntax above are the literals themselves on the right-hand side of the equals sign? Why are the variables the ones in the parenthesis for the functions if the literals are the actual arguments? Pairing "123" and "345" in the parentheses implies they are related, but they don't seem to be? Sure it would be nice if that could be done on one line, but then again, there are already a few ways to shorten it down:

final xString = "123", yString = "345";
final x = int.parse(xString), y = int.parse(yString);

final x = int.parse("123"), y = int.parse("345");
// or, more realistically
final x = int.parse(getX()), y = int.parse(getY());

In this case, @lrhn's list pattern syntax is more intuitive IMO:

  (var x, var y) = ("123", "345").map(int.parse);  // Applies int.parse to every element of tuple.

Here it's obvious what's happening and where the data is going, and only step 3 is "new" to pattern matching:

There is an Iterable<String> of "123" and "345"
Each element in that iterable is parsed, creating an Iterable<int>
That iterable is checked to make sure it contains exactly 2 elements, which are bound to x and y.

lrhn commented 2 years ago

@Levi-Lesches If that's your takeaway, then my syntax has failed. There is no Iterable<String>, but a tuple (String, String), and map is likely an extension method extension <T> on (T, T) { (R, R) map<R>(T Function(T) convert => (convert(this[0]), convert(this[1])); }.

Records/tuples and iterables are significantly different, because elements of records do not need to have the same type. If they do, we can introduce a point-wise map operation as an extension, and we might just do that for .. tuples up to n.

munificent commented 2 years ago

I admit I have a very hard time reading
var (parseInt(x), parseInt(y)) = ("123", "345");
print(x + y); // "468".
and it's not because I am completely unused to patterns.

Agreed, it looks weird. It may be that I chose a particularly unfortunate example since parseInt() really sounds like an imperative function that takes an argument. Most function-call-like syntax in patterns is class names and reads more like declarative constructor calls.

munificent commented 2 years ago

The proposal has changed somewhat since most of this discussion happened. In particular:

The record pattern syntax now only matches records and does not allow calling getters on arbitrary objects. You have to use named extractor patterns to call getters on objects.
By default, a record pattern must match all names fields of the corresponding record value. (I'd like to add a ... syntax to allow you to ignore some fields, but that's not in there yet.)

Even so, the proposal does still support records with no fields and records with a single positional field. There is a longer-term goal to support spreading records into argument lists (#1293). (Or more generally, to be able to use a record to represent a reified argument list.) To support that with as much generality as possible, that means supporting records of all shapes, since parameters lists may accept zero or only one positional parameter.

There is the separate question of what syntax you use to create a record with zero fields or just one positional field. The proposal currently says:

There is no syntax for a zero-argument record. Instead, there is a constant Record.empty that you can use.
As suggested above, a parenthesized expression with a trailing comma produces a single-field record, similar to Python.

For record types, () represents the type of a record with no fields and a parenthesized type with a trailing comma represents a record type with only a single positional field. (We could allow omitting the comma in record types since it's not currently ambiguous, but I think it's worth keeping parentheses available for use in type annotations if we ever get union types or other infix type expressions where precedence might come into play.)

I think the language team is basically OK with all this, so I'm going to close this out. We can definitely re-open and keep discussing it, though, if there are concerns about whether it's worth supporting them at all and/or what the syntax should be, if any.

Hixie commented 2 years ago

Semantically-meaningful trailing commas seem dangerous, given how we've been teaching people for a few years that they can transparently add or remove trailing commas for stylistic reasons.

munificent commented 2 years ago

Yeah, it's not idea. But, for what it's worth, Python allows semantically-ignored trailing commas in list literals and argument lists and also uses (foo,) as the syntax for one-element tuples.

Hixie commented 2 years ago

My dear hope is that we create a language substantially better than Python. Otherwise, I'd just use Python. :-)

munificent commented 2 years ago

I think we can be better than Python overall (for many use cases) without needing to be superior specifically in the area of "using trailing commas to indicate single-element tuples". :)

Hixie commented 2 years ago

Sure, I'm just saying that "they do it" isn't a relevant argument one way or the other.

We seem to be agreed that semantically-meaningful trailing commas are dangerous and not ideal. I understand from the comment above that there's good reasons for supporting 0-field and 1-field records.

ASCII gives us four sets of brackets, all of which are overloaded in dart, and dart also has one other matching pair of symbols that I can think of: []: list lterals, optional parameter lists {}: maps, sets, named parameter lists, statement blocks, declarative blocks for enums, classes <>: type argument lists, type parameter lists, operators (): nested expressions, argument lists /* */: comments

We could take a page out of the pre-ASCII days and introduce another kind of syntax, like (: :) for tuples. I'm not sure I would like that either, since that's not what most languages do.

Anecdotally, Python's syntax is confusing to developers (there's a lot of questions about this on the web). C# has no literal syntax for 1-tuples, just uses a constructor. I briefly Googled around for other languages but wasn't able to get a clear idea of what C++ and Swift do.

I don't have a good solution here. I just think it's worrying that this:

  var x = (
    2,
  );

...means something radically different than:

  var x = (2);

...while these two, which look very similar, do mean the same thing as each other:

  var x = y(
    2,
  );
  var x = y(2);

...especially after years of telling people to add commas just like that to control the formatter (which should definitely not be affecting semantics).

leafpetersen commented 2 years ago

We discussed this in the meeting this morning. There was fairly broad consensus that the current specification which uses a field on Record for an empty tuple, a required trailing comma for a singleton record, and an optional trailing comma everywhere else was unsatisfying, particularly given that () is the empty record type, and (T) is the singleton record type.

Three alternatives were considered.

The first was to treat positional record fields as syntactic sugar for named fields. So (e0, e1) would be considered syntactic sugar for (and equivalent to) ($0 : e0, $1 : e1). This would allow you to use ($0 : e0) for the singleton tuple. The main objection to this was that this does not match up well with the desire to maintain a correspondence between records and argument lists (keeping in mind possible future designs around capturing and spreading argument lists).

The second was to remove the singleton record syntax (with the trailing comma) in favor of a constructor on Record, so that Record.single(e) would produce a single element record. This leaves inconsistency between the 0-1 length records, but divides records into two consistent categories: 0-1 length (use a constructor, rare), and 2+ length (common, use literal syntax). The main objection to this (I think, correct me if I'm wrong) was that the syntax for the types and the terms diverges - it's a bit odd that () is the type of the empty record, but the only syntax for it is Record.empty, and similarly for the singleton.

The third was to remove Record.empty in favor of () as the term level syntax for the empty tuple, and retain the required trailing comma for the singleton tuple. This has the benefit that the syntax is almost entirely uniform, both within the term and type levels, and between them: with the notable exception that the singleton tuple is required to use the trailing comma syntax which is otherwise optional. This option had the most support on the team. There were some concerns that () would be ambiguous, but we believe that this is not the case. There were some concerns that we would regret using up this syntax in the future (i.e. that we would want it for something else), but we had no concrete ideas of what we would want to do with it.

The concern from @Hixie above is a real one. It is unfortunate that the comma in the singleton case radically changes the semantic meaning. An argument that this is not likely to be too painful in practice might be the following reasoning:

Intentionally writing a singleton tuple is likely to be very rare - it's not especially useful. So the user is unlikely in practice to be trying to write a singleton tuple (e1,) and accidentally write a parenthesized expression (e1).
Intentionally adding a trailing comma to a parenthesized expression is essentially never going to be done, so the user is unlikely in practice to be trying to write a parenthesized expression (e1), and accidentally write a tuple (e1,). (I suppose you could have something like (e1).reallyLongMethod(), and then decide to add a trailing comma to control the formatting, and end up with a tuple accidentally? Still feels like a stretch).

A concern with the above is that if we ever add juxtaposition as an operator (e.g. perhaps with some kind of block syntax/trail argument), we might end up in a really bad place here. The reasoning would be that currently foo(a) and foo(a,) are both accepted style (and not uncommon, I think?), but would become semantically divergent if we had some kind of juxtaposition based syntax. That is if Foo e1 ever becomes valid syntax for an arbitrary expression e1, then it seems quite bad that Foo (e1) and Foo (e1,) mean completely different things given the existing conventions around argument lists.

cc @munificent @eernstg @lrhn @jakemac53 @natebosch @mit-mit @stereotype441 @kallentu

Hixie commented 2 years ago

Re how common this is, I was surprised at how many posts I found where people were asking about this for Python. (I didn't find anywhere near as many for other languages, but then Python is both more popular in general and more popular with less experienced programmers, so I don't know what to read into this.)

Re the trailing comma danger, my concern is more about people removing it from tuples (and breaking their code in pretty subtle ways?) than people adding it to expressions. I guess we'd also have to decide how (foo,) formats. Does it do the same style as {foo,} or the same style as (foo)?

Is there a world where we somehow coerce scalars into one-tuples? I'm not up to date on exactly how tuples will be implemented so maybe this doesn't make much sense or would lead to too many issues.

FWIW given the lack of any good options here I understand if we decide we have to go with this anyway.

leafpetersen commented 2 years ago

Re how common this is, I was surprised at how many posts I found where people were asking about this for Python. (I didn't find anywhere near as many for other languages, but then Python is both more popular in general and more popular with less experienced programmers, so I don't know what to read into this.)

One important difference here is that Python is dynamically typed, which means that whenever you make this mistake, you never find out about it until runtime (and even then, it may "just work" for a while). In a statically typed language, most of the time you're going to immediately get a static error (not always, of course, but usually).

Re the trailing comma danger, my concern is more about people removing it from tuples (and breaking their code in pretty subtle ways?) than people adding it to expressions.

Yes, I can definitely see this happening. I start with an expression that looks like:

var x = (longThing,
              anotherLongThing,);
...

And then I decide I don't need anotherLongThing, refactor to

var x = (longThing);
...

Or initially factor to

var x = (longThing,
    );
...

followed by a comma deletion.

Again, I think in most scenarios you will just get a static error, but not always.

Is there a world where we somehow coerce scalars into one-tuples? I'm not up to date on exactly how tuples will be implemented so maybe this doesn't make much sense or would lead to too many issues.

This isn't totally unthinkable, but I think it has its own warts. e.g. All of the record types are subtypes of Record ... except (T) (unless we make all types a subtype of Record, which... no). And (e) is equivalent to e, but (x : e) is not (at least unless we say that record getters aren't reachable via dynamic calls). Currently this code gets the first field of any record with at least one positional field: Object? first(Record r) => (r as dynamic).$0, but this would no longer work for unary records. I also worry about complications that would arise in the future if we add capturing and spreading argument lists. If capturing the argument list of a one argument function gives you a unary tuple, which means it just gives you the underlying value... what happens if you then try to spread that again? If the value itself is a tuple, then how do you know at runtime whether you are spreading a unary tuple (which just happened to have a tuple as its only field) or whether you should spread the underlying tuple? This all can maybe worked out, but it feels to me to add a lot of complexity and risk, for what seems to me to be limited payoff.

FWIW given the lack of any good options here I understand if we decide we have to go with this anyway.

👍 It's definitely good to talk through the options (and lack thereof) here, but yeah, there may just be tradeoffs we have to make here.

Hixie commented 2 years ago

Ooh, your comment about static analysis made me think... one option to hugely mitigate this problem is for us to make the error messages / analyzer messages explicitly call this case out. Instead of "A value of type 'int' can't be assigned to a variable of type 'MyFancySingleTuple'.", we could have it notice that the thing being assigned is an expression in parentheses and instead say ""A value of type 'int' can't be assigned to a variable of type 'MyFancySingleTuple'; consider adding a trailing comma to change the expression into a one-value tuple literal."" or something like that.

leafpetersen commented 2 years ago

cc @bwilkerson @srawlins @johnniwinther on the error messages

lrhn commented 2 years ago

Is there a world where we somehow coerce scalars into one-tuples?

I have extensively considered whether we can make a one-tuple and a single value be the same thing. In short: No, not in any realistic way.

Mathematically, it should be the same, since a singleton Carthesian set product is the same as the original set, X¹ = X. (And the empty tuple should be the same as the Null value, because a category only needs one unit type, they're all isomorphic anyway.)

It just won't fly, and not for lack of trying.

We want a lot of things for records, and some of those fly in the face of treating records as mathematical Carthesian products. We want unboxing. And performance. And being able to use record types as first class types. And named record fields.

Then we can't treat records as mathematical set products. If we did, then ((int, int), (int, int)) should be isomorphic and therefore equivalent to (int, int, int, int). Then we'd have mutual subtyping between (int, int, (int, int, int)) and (int, int, int, int, int). I tried that, but it doesn't work in a language with generics. (And with named fields, it gets tricky too). You wouldn't be able to predict the structure returned by (int, int, R) foo<R>(int x, int y, R color) => (x, y, color); if R happens to be bound to (int r, int b, int g).

If we want any kind of performance, we need to be able to predict the structure of records at compile time. Which means that we need to assume that (int, int, R) has three fields, and that all its subtyping relations can be predicted statically, without knowing the precise type of R.

So records nest.

We also want to, eventually, be able to use records to represent argument lists. Say, for noSuchMethod and Function.apply. At that point, there is a distinction between a one-element argument list containing a pair of integers, and a two element argument list containing two integers. We lose that distinction by making the singleton record be equivalent to its value.

(About making null and the empty record be the same type ... that's too late. It would make null be an Object, which would be incredibly breaking for code which assumes that things that is Object is not null. If we had made Null a subtype of Object when we designed null safety, which was a possibility with different tradeoffs, then maybe it could work, but today it's just not an option.)

johnniwinther commented 2 years ago

cc @chloestefantsova

munificent commented 2 years ago

The second was to remove the singleton record syntax (with the trailing comma) in favor of a constructor on Record, so that Record.single(e) would produce a single element record. This leaves inconsistency between the 0-1 length records, but divides records into two consistent categories: 0-1 length (use a constructor, rare), and 2+ length (common, use literal syntax). The main objection to this (I think, correct me if I'm wrong) was that the syntax for the types and the terms diverges - it's a bit odd that () is the type of the empty record, but the only syntax for it is Record.empty, and similarly for the singleton.

For what it's worth, the ambiguity discovered in #2469 means that we're going to have to change the record type syntax too. If we change that to Record(int, bool) (which I'm personally leaning towards), then I find it somehow more acceptable to use Record.empty and Record.single(e) for the 0 and 1 expression forms. It makes the overall type feel a little more "nominal and object-oriented" to me (which, I get, is not a good thing for some).

I guess we'd also have to decide how (foo,) formats. Does it do the same style as {foo,} or the same style as (foo)?

Ooh, that's a good point. It would do the latter, and keep the , right next to the closing ) on the same line, as in:

var rec = (123,);

I would not allow any split after the ( or before the , or ). They'll just forcibly adhere to the inner expression, which I think is what we want.

The nice thing about this is that having distinct formatting here means that properly formatted code would help users distinguish single-element records from argument lists with a trailing comma, since the only time they will ever see a , right next to ) is when it's a record.

Levi-Lesches commented 2 years ago

For what it's worth, the ambiguity discovered in #2469 means that we're going to have to change the record type syntax too. If we change that to Record(int, bool) (which I'm personally leaning towards), then I find it somehow more acceptable to use Record.empty and Record.single(e) for the 0 and 1 expression forms.

Unless I'm missing something, that would also allow for Record() and Record(int) for the 0 and 1 forms, right?

munificent commented 2 years ago

The idea being discussed there is that we'd use Record(int, bool) as the syntax for record types, but record expressions would still just use (1, true), etc.

munificent commented 2 years ago

We've decided that records can have zero or one positional field, and settled on a syntax as of #2535.