dart-lang / language

Design of the Dart language
Other
2.66k stars 205 forks source link

Should record fields start at `$0` or `$1`. #2638

Closed lrhn closed 1 year ago

lrhn commented 1 year ago

Last minute bike-shedding, I know.

We have defined positional record fields to be accessible as $0, $1 etc. getters. While starting at zero is good for integer indices (like arguments to operator[]), I'm not sure it is the best choice here. The getters are not integers, you won't have to do arithmetic on them, so starting at zero is not a benefit in that regard.

I worry that users will expect think of them as "1st" field, "2nd" field, etc., and therefore expect them to start at 1. (I worry about that, because that's what I catch myself thinking.)

If it's not too late, I'd like to suggest we start at $1 instead. (I promise I'll fix our test files if we make the change!)

munificent commented 1 year ago

Nooooo. Zero-based everything. The name is a field index (0), not an ordinal (first).

But I'm still somewhat tempted to just not have positional field getter expressions at all since the names are pretty ugly with the leading $.

lrhn commented 1 year ago

But you can't index fields, so this is the only place that this index surfaces.

If I were to name them myself, I'd be very likely to do things like (int octet1, int octet2, int octet3, int octet4) ipv4Address = ...;.

Same if I write parameters, int combine(int v1, int v2, int v3) => v1 + v2 + v3;. I'd never start those at zero.

Maybe we can allow you to use the static names of record fields as static aliases for the $x getters, so:

(int octet1, int octet2, int octet3, int octet4) ipv4Address = ...;
var int32 = (ipv4Address.octet1 << 24) | (ipv4Address.octet3 << 16) | (ipv4Address.octet2 << 8) | (ipv4Address.octet1);

Basically, a static record type with named positional fields introduces implicit extension getters on the value. (Possibly a little too fragile, since it makes it an error to change the name of a positional field.)

Guess I just need to make my own

extension Project2<T1, T2> on (T1, T2) {
  T1 get p1 => $0;
  T2 get p2 => $1;
}

projection getters :wink:.

Compare also to the Tuple library which uses .item1, .item2, etc.

(UX testing?)

Levi-Lesches commented 1 year ago

I think for consistency's sake it's a lot easier to explain that "just as the first element of a list is [0], the first field of a record is $0", rather than "sometimes we use 0, sometimes we use 1, they both mean the same thing except when 0 means 1st and 1 means 2nd". And then if someone wants to use ordinals (v1, octet1, etc), they should use named fields to specify what's first and why it matters.

lrhn commented 1 year ago

For comparison, I just did a quick prototype of a parallel-wait extension on records of futures. The code ended up as:

extension FutureRectord3<T1, T2, T3>
    on Vector3<Future<T1>, Future<T2>, Future<T3>> {
  Future<Vector3<T1, T2, T3>> operator ~() {
    final c = Completer<Vector3<T1, T2, T3>>.sync();
    final v1 = _FutureResult<T1>($0);
    final v2 = _FutureResult<T2>($1);
    final v3 = _FutureResult<T3>($2);
    var ready = 0;
    var errors = 0;
    void onReady(int error) {
      errors += error;
      if (++ready == 3) {
        if (errors == 0) {
          c.complete(Vector3(v1.value as T1, v2.value as T2, v3.value as T3));
        } else {
          c.completeError(ParallelWaitError(
            Vector3<T1?, T2?, T3?>(v1.value, v2.value, v3.value),
            Vector3<AsyncError?, AsyncError?, AsyncError?>(
                v1.error, v2.error, v3.error),
          ));
        }
      }
    }

    v1.onReady = v2.onReady = v3.onReady = onReady;
    return c.future;
  }

where Vector3 is a class with the same API as a three-positional-element record.

One of all these numbered things is not like the other.

In every other case where I had three numbered things, I'd naturally number them as 1, 2 and 3. Just as I would in a parameter list. Record positions really do stand out in this context.

(I recommend trying to write something realist with records, using the $x notation, and see how it feels.)

leafpetersen commented 1 year ago

If I saw that code using $1 based indexing, I would assume that you were ignoring the first field. I don't really understand why you named the variables inconsistently with their position, but that's your choice. How would you have written this code if the receiver were a list instead of a record, and you were using [0] instead of $0?

lrhn commented 1 year ago

I named every variable consistently with its position, starting from 1, like I would have done in any other case. I never name something starting from zero. It's names, not indices. The trailing numbers can't be computed, they are not integers.

If I had to write a generic function taking three typed parameter, I'd write

void foo<T1, T2, T3>(T1 v1, T2 v2, T3 v3) => ...

every time. Starting from zero wouldn't occur to me.

If the input was a list, I might do:

void foo<T1, T2, T3>(List values) {
  T1 v1 = values[0] as T1;
  T2 v2 = values[1] as T2;
  T3 v3 = values[2] as T3;
}

I might use T1 element0 = values[0]; if I wanted to emphasize that it's list elements, and I wanted to put them back into a list. Otherwise I wouldn't.

Indices are different from names. Records are not lists. They are closer to parameter lists than lists, and I'd never start a parameter list at v0 either.

It might all come down to perspective.

I can see other languages do different things. Swift treats tuple access as indices, starting from zero, with syntax tuple.0, tuple.1. C# treats it as named getters starting from 1, tuple.Item1, tuple.Item2.

The $-getters in Dart feels more like names to me than indices, which is probably why staring at zero feels like the wrong choice.

jakemac53 commented 1 year ago

The $-getters in Dart feels more like names to me than indices, which is probably why staring at zero feels like the wrong choice.

But the $ here is really just a hack to get nicer (different) types on each, but still have it look as much like an index as possible. Conceptually they are more like indices, and if we could make the [] return a different type for each index we probably would have used that?

munificent commented 1 year ago

Maybe we can allow you to use the static names of record fields as static aliases for the $x getters, so:

(int octet1, int octet2, int octet3, int octet4) ipv4Address = ...;
var int32 = (ipv4Address.octet1 << 24) | (ipv4Address.octet3 << 16) | (ipv4Address.octet2 << 8) | (ipv4Address.octet1);

Basically, a static record type with named positional fields introduces implicit extension getters on the value. (Possibly a little too fragile, since it makes it an error to change the name of a positional field.)

We've talked about this, but it's a dead end. Either the field names are part of the type or they aren't. You can't have it both ways or it gets weird:

(int a, int b) ab = (1, 2);
(int b, int a) ba = (3, 4);
var either = flipCoin ? ab : ba;
print(either.a); // ???

If I were to name them myself, I'd be very likely to do things like (int octet1, int octet2, int octet3, int octet4) ipv4Address = ...;.

Same if I write parameters, int combine(int v1, int v2, int v3) => v1 + v2 + v3;. I'd never start those at zero.

I have to admit that when I number parameters or type parameters, I start at 1 too. (Though every time I do that, I do consider starting at zero instead. It's an annoying brain speed bump.)

Guess I just need to make my own

extension Project2<T1, T2> on (T1, T2) {
  T1 get p1 => $0;
  T2 get p2 => $1;
}

projection getters 😉.

You know, now that you mention it... We could simply not have positional field getters defined by the language at all. Then if users want some, they can define (or reuse) their own extensions like this and name/number them however they want. That would also avoid all of the problems where an implicit positional field getter collides with a named field one as in:

var wat = (1, 2, $0: 3, $1: 4);

And because of this, it means there are fewer edge cases when it comes to being able to spread records into parameter lists. Instead of trying to come up with a sufficiently unusual positional field getter name (hence the ugly $) to reduce the chances of collision, we could avoid it entirely. While, at the same time, providing a nice expression syntax using extensions if users want that.

I could even see someone defining:

extension Cardinals<T1, T2, T3> on (T1, T2, T3) {
  T1 get first {
    var (first, _, _) = this;
    return first;
  }

  T2 get second {
    var (_, second, _) = this;
    return second;
  }

  T1 get third {
    var (_, _, third) = this;
    return third;
  }
}

The main problems with this I can see are:

leafpetersen commented 1 year ago

You know, now that you mention it... We could simply not have positional field getters defined by the language at all. Then if users want some, they can define (or reuse) their own extensions like this and name/number them however they want.

To be blunt, this seems like just a terrible idea to me. As you observe, we don't have anything like the kind of row polymorphism you would need to make the code re-use work well. If people actually go this route, it will be a mess. There will be inconsistently named, redundant, and highly verbose helpers scattered all over the place. And if you accidentally end up with two extensions that define $0 for you on your type in scope, you get conflicts.

If we really truly believed that no-one will use getters (I don't), we could leave them out. But leaving them out in favor of "just define extensions" just seems like a really bad idea.

leafpetersen commented 1 year ago

If we really truly believed that no-one will use getters (I don't)

Just to expand on this a bit further, here is a small set of list pair helper methods written as extension methods using positional getters:

extension ListPair<S, T> on List<(S, T)> {
  List<R> mapFirst<R>(R Function(S) f) => map((p) => f(p.$0)).toList();
  List<R> mapSecond<R>(R Function(T) f) => map((p) => f(p.$1)).toList();

  List<R> map2<R>(R Function(S, T) f) => map((p) => f(p.$1, p.$2)).toList();

  List<S> firsts() => map((p) => p.$0).toList();
  List<S> seconds() => map((p) => p.$1).toList();
  (List<S>, List<T>) unzip() => (firsts(), seconds())
}

Here is the same code written using pattern matching:

extension ListPair<S, T> on List<(S, T)> {
  List<R> mapFirst<R>(R Function(S) f) =>  map((p) {
      final (v0, _) = p;
      return f(v0);
    }).toList();
  List<R> mapSecond<R>(R Function(T) f) => map((p) {
      final (_, v1) = p;
      return f(v1);
  }).toList();

  List<R> map2<R>(R Function(S, T) f) => map((p) {
      final (v0, v1) = p;
      return f(v0, v1);
  }).toList();

  List<S> firsts() => map((p) {
      final (v0, _) = p;
      return v0;
  }).toList();
  List<S> seconds() => map((p) {
      final (_, v1) = p;
      return v1;
  }).toList();
  (List<S>, List<T>) unzip() => (firsts(), seconds())
}

I know which version I would prefer to be writing and reading. If we had parameter patterns, the code above could be written without using getters ... but we don't. But even if we had parameter patterns, you still have expression oriented code where you don't care about a field. Continuing the theme above:

extension ListPair<S, T> on List<(S, T)> {
  S firstOfFirst() => first.$0;
   T secondOfFirst() => first.$1;
}

vs

extension ListPair<S, T> on List<(S, T)> {
  S get firstOfFirst {
      var (v, _) => first;
      return v;
   }
   T get secondOfFirst  {
      var (_, v) => first;
      return v;
   }
}

Forcing the user to bind variables in a block in order to use a value once is just noise.

I don't love the positional getter syntax, and I'm open to using an alternative, but I really do think this is something that we want to have (and I think I'm by far the person on the team who has spent the most time working with tuples, so I do put a bit more weight on my opinion here than I normally would).

On the original topic of this issue, I will admit that when I wrote the first method above, I initially used p.$1 instead of p.$0 in the definition of mapFirst - a small natural experiment. :). But then, SML tuple projections start with #1, so perhaps I'm not an unbiased experimental subject.

eernstg commented 1 year ago

I don't have a strong opinion here, but I do have the same preference as @lrhn in this area:

Numbering from zero is justified when we're considering indices as offsets (myList[x + y] is the element whose offset from the xth element is y, so myList[y] is the element whose offset from the beginning is y), which works really well with C style pointer arithmetics (int *mySubArray = myIntArray + y;), and perhaps some algorithms working on arrays (in some language, including lists in Dart).

In contrast, the first element of anything that doesn't already have a firmly zero-based convention is 'first', not 'zeroth'.

lrhn commented 1 year ago

What I really want for extensions is pattern matching on the this value. Like:

extension FutureTuple2<T1, T2> on (Future<T1> f1, Future<T2> f2) {
  Future<(T1, T2)> operator~() { 
    var r1 = record(f1);
    var r2 = record(f2);
    // ....
  }
}

Pattern has to be a valid declaration pattern, so we can extract a type schema from it to match against the static types at calls.

We'll get that eventually. (Because we will get patterns as parameters, and we'll get primary constructor-like syntax for extensions when we get them for inline classes. I have stated my goals :grin:!)

I agree that not providing a canonical way to access positional elements of a record type will cause people to create their own. It's something that you really only need one of, and we can do it for all record types, which nobody else can. I just happen to prefer starting at $1, because of the way I look at the getters, as names, but I prefer having getters starting at $0 to not having any. Threatening to create my own was very much tongue-in-cheek. (I'd at least use an inline class for it, so I could use $1..$n properly! :yum:).

(We discussed using [0], but it looked too much like indexing, when it really wasn't. The Swift .0 syntax is somewhere between a member access and an index. It gets complicated by .0 already being a double literal by itself.)

natebosch commented 1 year ago

I agree that a numerically named getter doesn't need to be treated the same as an index/offset. I don't have strong feelings about starting at $0 or $1, but after reading this thread I lean towards $1. I do think it is likely to feel natural in more situations.

Wdestroier commented 1 year ago

If the syntax is changed from tuple.$n to tuple.n in the future (the most good looking to me) then I'd probably prefer to start at tuple.0 over tuple.1.

vxern commented 1 year ago

I worry that users will expect think of them as "1st" field, "2nd" field, etc., and therefore expect them to start at 1. (I worry about that, because that's what I catch myself thinking.)

Having lists with indices starting at 0 and thinking of the elements within the list as first, second, third, etc. are not mutually exclusive cases. I, at least, don't think of a list as having a zeroth element at the start.

If it's not too late, I'd like to suggest we start at $1 instead. (I promise I'll fix our test files if we make the change!)

Switching now to 1-based indices will introduce a very glaring irregularity that doesn't provide nearly enough benefit over 0-based indices to be justifiable in implementing. Combined with the peculiar .$n syntax suggested over .n, a change like this would surely prove very controversial once released to the public.

munificent commented 1 year ago

I poked around again to see what other languages do:

Dedicated syntax

Numbered identifiers

Some sort of dependent typing

Use . followed by an integer literal

Only have expressions for the first and second elements of pairs

Of the languages that use numbers in the syntax, five of them start at 0 and three start at 1.

The Dart proposal uses a normal identifier ($) suffixed with a number. C# and Scala 2 are the other languages that take that approach (Item and _, respectively), and they both start at 1.

All of the languages that use an actual integer literal expression in the syntax (SML uses an integer as a label) start at 0.

So it seems like if the index is "identifier-like" it tends to start at 1 and if it's "number-like", it starts at 0.

That seems reasonable to me. Given that, if we're going to stick with the $ syntax for Dart, I think we should start at 1.

Aesthetically, I quite like what Rust and Swift do. If we could make that work for Dart, I would be inclined to give it a try.

Wdestroier commented 1 year ago

Python and TypeScript tuples start at 0 too (they may have been purposely omitted).

Levi-Lesches commented 1 year ago

I can see why comparing Dart's records to other languages with tuples is valuable, but Dart as a whole is very close to the likes of Java, Python, and TypeScript. These languages all start from 0. Dart also uses 0-based indexing for lists, RegExp groups, and every other place where an order is defined. Sure, in some cases you may have Container<T1, T2>, and func(part1, part2), but in every instance where the language defines an ordering, it starts at 0.

To have records start at 1 may fall in line with other implementations of tuples but would be pretty inconsistent within Dart and the assumptions that new Dart developers may have. I strongly believe intuition and simplicity outweigh "correctness" in cases like these and choosing $0 would cause the least surprises. Just my opinion, but I think keeping in mind the "new to Dart" demographic can only help keep the language simple and learnable.

The getters are not integers, you won't have to do arithmetic on them, so starting at zero is not a benefit in that regard.

In other words, the benefit is that every developer comes into programming having been taught "computers start counting at 0", and will probably assume that everything is 0-indexed.

jakemac53 commented 1 year ago

So it seems like if the index is "identifier-like" it tends to start at 1 and if it's "number-like", it starts at 0.

This is where the main disagreement lies I think - I get that $0 is technically an identifier but to me that is not how it actually feels. It feels more like an index to me, I think specifically because it does use an actual integer in the name, and the fact that $ feels more like something special than a normal identifier.

jakemac53 commented 1 year ago

Said another way, ask yourself to compare $1 to the following two things, which one does it seem most similar to?

I would say [1], the literal edit distance is obviously much smaller, and also conceptually it seems more similar as well (to me).

lrhn commented 1 year ago

It's the right question to ask, and I think $1 is closer to first, and even closer to item1, than to [0], because it is a name.

There is no indexing, no computation of integers. It's just a (very short) name, one of several numbered names.

So reasonable people disagree. What will we do :)

jakemac53 commented 1 year ago

Leave it up to the AI? rofl

image

natebosch commented 1 year ago

How about $0 by default, but you can add a comment like // $[ = 1 to switch to $1.

Jetz72 commented 1 year ago

Even if these fields aren't designed to be used like array indexes with variable access, is it possible some future language/SDK feature would have a reason to do this in some capacity? Static metaprogramming? Some kind of serialization?

stereotype441 commented 1 year ago

FWIW, I'm in the "start at $0" camp. I have no justification for my preference other than a stubborn conviction that I'm right.

That being said, I will lose 0 nights of sleep if $1 is chosen instead.

jakemac53 commented 1 year ago

Another argument for $0 would be that anywhere you see the $0, it is totally unambiguous. And I would hypothesize that most code using these getters would be doing something with all of them, so you would see a $0 in most nearby code, and understand things are zero indexed without having to do any research. If you see $1 that is ambiguous, even if all indexes are in fact handled in surrounding code, you might wonder if the first one is being skipped for some reason.

munificent commented 1 year ago

Repeating my comment on this other issue:

I wanted to get some actual data about whether users prefer numbered lists of things in their code to be zero-based or one-based. I did some scraping. My script looks at type parameter lists and parameters. For each one, it collects all of the identifiers that have the same name with numeric suffixes. For each of those sequences, it sorts the numbers and looks at the starting one.

After looking at 14,826,488 lines in 90,919 files across a large collection of Pub packages, Flutter widgets, and Flutter apps, I see:

-- Start (2740 total) --
   1544 ( 56.350%): 1     ===============================
   1114 ( 40.657%): 0     ======================
     59 (  2.153%): 2     ==
      6 (  0.219%): 30    =
      4 (  0.146%): 8     =
      3 (  0.109%): 11    =
      2 (  0.073%): 32    =
      2 (  0.073%): 5     =
      2 (  0.073%): 6391  =
      1 (  0.036%): 3     =
      1 (  0.036%): 91    =
      1 (  0.036%): 37    =
      1 (  0.036%): 24    =

So there's a slight preference for 1-based, but not huge. Looking at parameter lists and type parameter lists separately:

-- Parameters start (2618 total) --
   1435 ( 54.813%): 1     ==============================
   1105 ( 42.208%): 0     =======================
     55 (  2.101%): 2     ==
      6 (  0.229%): 30    =
      4 (  0.153%): 8     =
      3 (  0.115%): 11    =
      2 (  0.076%): 32    =
      2 (  0.076%): 5     =
      2 (  0.076%): 6391  =
      1 (  0.038%): 3     =
      1 (  0.038%): 91    =
      1 (  0.038%): 37    =
      1 (  0.038%): 24    =

-- Type parameters start (122 total) --
    109 ( 89.344%): 1  ===================================================
      9 (  7.377%): 0  =====
      4 (  3.279%): 2  ==

The stark difference here suggests that may be some outlier code defining a ton of type parameter lists with a certain style. Indeed, if we look at the number of sequences in each package:

-- Package (6089 total) --
   1344 ( 22.073%): ffigen-6.1.2
    500 (  8.212%): realm-0.4.0+beta
    440 (  7.226%): artemis_cupps-0.0.76
    308 (  5.058%): _fe_analyzer_shared-46.0.0
    277 (  4.549%): tencent_im_base-0.0.33
    250 (  4.106%): realm_dart-0.4.0+beta
    172 (  2.825%): flutter-flutter
    167 (  2.743%): invoiceninja-admin-portal
    167 (  2.743%): invoiceninja-flutter-mobile
    111 (  1.823%): statistics-1.0.23
     71 (  1.166%): dart_native-0.7.4
     59 (  0.969%): sass-1.54.5
     56 (  0.920%): fpdt-0.0.63
     53 (  0.870%): objectbox-1.6.2
     49 (  0.805%): medea_flutter_webrtc-0.8.0-dev+rev.fe4d3b9cd21a390870d5390393300371fe5f1bb2
     46 (  0.755%): linter-1.27.0

So ffigen (whose names suggests contains a ton of generated code) heavily skews the data.

Really, what we want to know is not what each sequence prefers, but what each user prefers. If only one user prefers starting at zero and everyone else prefers starting at one, but that user authors thousands of parameter lists, that doesn't mean they get their way.

To approximate per-user preference, I treated each top level directory as a separate "author". For each one, I looked at all of the sequences in it to see if they start at one, zero, (or both):

-- By package/author (338 total) --
    305 ( 90.237%): Only one-based                 ===========================
     22 (  6.509%): Only zero-based                ==
     11 (  3.254%): Both zero-based and one-based  =

While there are many sequences that start with zero, they are heavily concentrated in a few packages like ffigen and realm. When you consider each package as a single vote for a given style, then there is a much larger number of packages that contain one-based sequences. If you look at them, each one-based package only has a fairly small number of sequences. But there are many of these packages. That suggests that most users hand-authoring type parameter and parameter sequences prefer starting them at one.

Based on that, I think we should start positional record field getters at 1 too.

We discussed this in this week's language meeting and reached consensus to change the starting index to 1. I don't think it's a perfect solution, but I think it's the overall winner.

stereotype441 commented 1 year ago

Wow, nice job finding high quality data to answer such a subjective question. As someone who had been previously arguing for zero-based, I'm very much convinced by this data that one-based will actually be more intuitive for most people.