dart-lang / language

Design of the Dart language
Other
2.61k stars 200 forks source link

Generator expressions / Iterable literals and Stream literals #1633

Open mmcdon20 opened 3 years ago

mmcdon20 commented 3 years ago

In dart we have literal expressions for lists, sets, and maps. I propose that we also allow for expressions that generate iterables and streams.

Take for example this code:

List<List<int>> pythagorean = [
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) [a, b, c],
];

If we wanted to instead generate an Iterable<List<int>> we could do that with a generator function using the sync* keyword. We can even use sync* on lambda expressions which looks like the following:

Iterable<List<int>> pythagoreanSynchronous = (() sync* {
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) yield [a, b, c];
})();

I propose that we could generate an Iterable by placing the sync* keyword before a list literal, which would be syntactic sugar for the above:

Iterable<List<int>> pythagoreanSynchronous = sync* [
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) [a, b, c],
];

Similarly using the async* keyword would instead generate a stream:

Stream<List<int>> pythagoreanAsynchronous = async* [
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) [a, b, c],
];

Of course when generating streams, you would likely want to use await within the context of the expression, so you should be able to do the following as well:

Future<List<int>> combineNumbers(int a, int b, int c) async => [a,b,c];

Stream<List<int>> pythagoreanAsynchronous = async* [
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) await combineNumbers(a, b, c),
];

Which would be syntactic sugar for:

Stream<List<int>> pythagoreanAsynchronous = (() async* {
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) 
          yield await combineNumbers(a, b, c);
})();

Should support spreads as well:

Iterable<int> pythagoreanSynchronousFlattened = sync* [
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) ...[a, b, c],
];

Which is equivalent to:

Iterable<int> pythagoreanSynchronousFlattened = (() sync* {
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) 
          for (var item in [a, b, c]) yield item;
})();

Raw values like:

sync* [1, 2, 3];

Are equivalent to:

 (() sync* { yield 1; yield 2; yield 3; })();

Generator expressions already exist in the python programming language (https://www.python.org/dev/peps/pep-0289/).

I feel this would make working with lazy collections in dart even easier.

lrhn commented 3 years ago

Interesting idea. The grammar won't work, sync* [ .... ] is currently valid syntax if sync is a variable of a type which has a multiplication operator . Probably rare, but still a breaking change.

It's also somewhat restrictive that you can only use collection syntax. I'd personally rather move towards allowing any statement inside a collection, and use yield to emit values. Probably going to be a hard sell.

mmcdon20 commented 3 years ago

Interesting idea. The grammar won't work, sync* [ .... ] is currently valid syntax if sync is a variable of a type which has a multiplication operator . Probably rare, but still a breaking change.

Fair enough, I suppose it would have to be defined to multiply by a List or Iterable or Object for it break. Also I don't think we necessarily have to use the exact syntax I've proposed.

It's also somewhat restrictive that you can only use collection syntax.

Sure, but there are also generator functions which are less restrictive. This wouldn't be much different from how python does it, where you have generator functions using function syntax and more restricted generator expressions using comprehension syntax.

I'd personally rather move towards allowing any statement inside a collection, and use yield to emit values. Probably going to be a hard sell.

Allowing statements inside a collection would be great, I agree with you there. If you allowed yield inside the collections, how would you distinguish between emitting values as an Iterable vs emitting values as a Stream? The way that generator functions make this distinction is by using either the sync* or async* keywords, and if you apply the same idea to collection syntax, seems like you would end up with nearly the same thing as I am proposing just with also yield statements.

Personally I think allowing the yield keyword inside the collection syntax could be confusing, for example:

[ 1, 2, yield 3 ]

Would this be allowed? And if it is allowed then what is the resulting value?

eernstg commented 3 years ago

We're basically considering syntactic sugar for () sync* {...} (), taking a look at, e.g., sync* [...].

We could use abbreviated function literals, #265, to get the somewhat more concise form sync* {...}():

Iterable<List<int>> pythagoreanSynchronous1 = sync* {
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) yield [a, b, c];
}();

// Compared to proposal here:

Iterable<List<int>> pythagoreanSynchronous2 = sync* [
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) [a, b, c],
];

Not that different! ;-) And the nice thing is that this is just another case where a more general mechanism (abbreviated function literals) provides a bit of conciseness.

It is indeed a breaking change to make sync* {...} and async* {...} function literals rather than multiplication expressions, but I expect that those multiplications will be rare.

It should not be a problem to parse those terms as function literals (Dart.g with updates as indicated in #265 handles the tests/language with no errors except for the missing syntax errors corresponding to constructs that used to be a syntax error, but which is now an abbreviated function literal).

mmcdon20 commented 3 years ago

@eernstg I use the curly_braces_in_flow_control_structures lint rule in production code. I'm sure many others do as well since the pedantic package uses this lint rule. So I think a more realistic comparison using abbreviated function literals would be:

Iterable<List<int>> pythagoreanSynchronous1 = sync* {
  for (int a = 0; a < 100; a++) {
    for (int b = a; b < 100; b++) {
      for (int c = b; c < 100; c++) {
        if (a * a + b * b == c * c) {
          yield [a, b, c];
        }
      }
    }
  }
}();

Iterable<List<int>> pythagoreanSynchronous2 = sync* [
  for (int a = 0; a < 100; a++)
    for (int b = a; b < 100; b++)
      for (int c = b; c < 100; c++)
        if (a * a + b * b == c * c) [a, b, c],
];

The code diverges a bit further when using spreads:

Iterable<int> oscillator1 = sync* {
  for (;;) {
    for (final item in [-1, 1]) {
      yield item;
    }
  }
}();

Iterable<int> oscillator2 = sync* [for (;;) ...[-1, 1]];

One of the benefits of the snytax I have proposed is that it will be really easy to switch between generating a list versus generating an iterable, since they both use collection syntax. Someone might start out writing their code with a list, but then later want to switch to an iterable due to memory concerns.

Also perhaps having a collection literal syntax for iterables would be useful in other areas. I know that pattern matching is being considered for the language #546, but I don't think the current proposal allows you to match against iterables. Languages like haskell and scala allow for pattern matching against lazy lists.

eernstg commented 3 years ago

really easy to switch

That's true.

But note that the generator function code doesn't have to use the extra loop:

Iterable<int> oscillator1 = sync* {
  for (;;) yield* [-1, 1];
}();

// Which should actually be written as the following, unless there is a need to mutate it:
Iterable<int> oscillator1a() sync* {
  for (;;) yield* [-1, 1];
}

Also, of course, the fact that the function allows statements rather than elements (or other expression-ish terms) allows the function bodies to be more expressive than the collection literals. If we allow statements in collection literals (including some new, hypothetical iterable literals) then we'd presumably need some markers like yield and yield* to indicate which parts of the computation will be considered contributions to the ongoing production of collection elements.

pattern matching .. against iterables

That is indeed an interesting idea. If there is a special syntax for iterable literals then it should certainly be aligned with a similar syntax for patterns matching iterables.

mmcdon20 commented 3 years ago

Also, of course, the fact that the function allows statements rather than elements (or other expression-ish terms) allows the function bodies to be more expressive than the collection literals. If we allow statements in collection literals (including some new, hypothetical iterable literals) then we'd presumably need some markers like yield and yield* to indicate which parts of the computation will be considered contributions to the ongoing production of collection elements.

@lrhn suggested the same thing.

It's also somewhat restrictive that you can only use collection syntax. I'd personally rather move towards allowing any statement inside a collection, and use yield to emit values.

But, I still don't really understand how that would work. Particularly with yield statements.


As a side note. One of the issues I have with trying to express this using existing language features is it just isn't obvious. Yes, the existing syntax is not that different from the proposed syntax, but the existing solution requires you to make clever use of a somewhat obscure language feature. If you look at the dart language tour, the section on generators makes no mention of the fact that generators can be expressed as anonymous functions. Similarly, the section on anonymous functions makes no mention of the fact that anonymous functions can be expressed as generators. I suspect plenty of dart developers are unaware of anonymous generator functions. And then you have to think to use an immediately invoked anonymous generator function. Conceptually there is a lot more going on compared to defining a collection with collection syntax.


...As another side note. It also occurred to me that these expressions might be able to support recursive definitions. For example:

Iterable<int> oscillator = sync* [-1, 1, ...oscillator];

// Could desugar to something like this:
Iterable<int> oscillator = () {
  Iterable<int> oscillator() sync* {
    yield -1;
    yield 1;
    yield* oscillator();
  }
  return oscillator();
}();
mmcdon20 commented 3 years ago

I came across this today and thought it was interesting. Apparently rust is considering similar features:

I have no experience writing rust programs (YouTube somehow recommended the video to me), so I might not fully understand the proposal as it relates to rust.

But, if I understand correctly, the proposal includes both generator functions and generator expressions for both iterables and streams.

There is a short example of what a generator expression might look like in rust:

let iter = async move iterator {
    for await x in stream {
        yield x;
    }
};
for await x in iter {
    println!("{:?}", x);
}
natebosch commented 3 years ago

I have wanted a syntax for iterable literals like this in the past. I think it fits well with enhanced collection literals and would let us write more code to look declarative. For example I'll sometimes rewrite a list.map(something).toList() to [for (var e in list) something(e)] - and I'd like to be able to still do that if I need an Iterable instead.

One disparity against collection literals is that I think we'd need to disallow await inside the elements when generating an iterable. If they syntax is something close to sync* [ that limitation would likely be clear.

mmcdon20 commented 3 years ago

I have wanted a syntax for iterable literals like this in the past. I think it fits well with enhanced collection literals and would let us write more code to look declarative.

Yeah, iterable literals would allow a declarative approach to writing iterables. Sure the syntax between an immediately invoked anonymous generator function and an iterable literal syntax would not be that much different, but in my experience you think about the problem differently when writing imperative vs declarative code.

I'll sometimes rewrite a list.map(something).toList() to [for (var e in list) something(e)] - and I'd like to be able to still do that if I need an Iterable instead.

I agree. Personally I never use map or filter anymore since collection-for and collection-if were added to the language.

One disparity against collection literals is that I think we'd need to disallow await inside the elements when generating an iterable. If they syntax is something close to sync* [ that limitation would likely be clear.

Good point, I hadn't considered that.

munificent commented 3 years ago

I'd personally rather move towards allowing any statement inside a collection, and use yield to emit values. Probably going to be a hard sell.

For what it's worth, when I was working on the UI as code stuff, I spent a lot of time trying to come up with a more general way of embedding arbitrary statements inside collections and using yield and utterly failed to find a syntax that wasn't horrid. The reason the if and for in collection stuff we shipped works well is because it defaults to assuming you want to emit a value. You don't have to opt out of "control flow" and into "produce a value" by using something like yield. Control flow / statements are just the wrong default for the contents of a collection.

When I was working on the control flow collection stuff, I did spend some time thinking about a syntax for iterable literals too. One option is to follow Python's generator expressions and use parentheses:

var iterable = (for (var i = 0; i < 1000000; i++) i);

But I didn't want to step on future tuple syntax:

var wat = (1, 2, 3); // Lazy iterable of three elements, or tuple?

I ended up not pitching anything. My intuition is that lazy iterable are rare enough that they probably don't warrant dedicated syntax (especially given the opportunity cost of using that syntax on something more common). My feeling is that the sync* IIFE pattern is probably sufficient.

natebosch commented 3 years ago

My intuition is that lazy iterable are rare enough that they probably don't warrant dedicated syntax (especially given the opportunity cost of using that syntax on something more common).

I'm curious why you think that - I see lazy iterables used frequently. I do occasionally find it limiting that some iterable chains need to use hacks like library.parts.followedBy([library.id]) because they don't end in .toList() or .toSet() and so can't be [...library.parts, library.id].

My feeling is that the sync* IIFE pattern is probably sufficient.

Just for comparison I tried migrating some real code in both styles.

Original:

  final canRunWithSoundNullSafety = await _allMigratedToNullSafety(
      packageGraph,
      reader,
      orderedBuilders
          .map((e) => e.import)
          .followedBy(postProcessBuilderDefinitions.map((e) => e.import)));

IIFE:

  final canRunWithSoundNullSafety =
      await _allMigratedToNullSafety(packageGraph, reader, () sync* {
    for (var b in orderedBuilders) {
      yield b.import;
    }
    for (var b in postProcessBuilderDefinitions) {
      yield b.import;
    }
  }());

Proposed generator:

  final canRunWithSoundNullSafety =
      await _allMigratedToNullSafety(packageGraph, reader, sync* [
    for (var b in orderedBuilders) b.import,
    for (var b in postProcessBuilderDefinitions) b.import
  ]);

I find the second migration more readable than either the IIFE or the original. If we assume that we stopped following the lint requiring braces for flow control, the IIFE version does look nicer, but now it violates our recommendations and still doesn't look as nice as the generator IMO.

  final canRunWithSoundNullSafety =
      await _allMigratedToNullSafety(packageGraph, reader, () sync* {
    for (var b in orderedBuilders) yield b.import;
    for (var b in postProcessBuilderDefinitions) yield b.import;
  }());
munificent commented 3 years ago

For what it's worth, you could also do:

  final canRunWithSoundNullSafety =
      await _allMigratedToNullSafety(packageGraph, reader, () sync* {
    yield* orderedBuilders.map((b) => b.import);
    yield* postProcessBuilderDefinitions.map((b) => b.import);
  }());

I agree having dedicated syntax looks nicer (though I'm not sold on sync*), but, I mean dedicated syntax for some pattern always looks nicer. It's dedicated syntax. :)

It's more a question of the opportunity cost of adding that dedicated syntax to the language when we could potentially use that grammar for other more common patterns, or spend that language design time on more valuable features.

mmcdon20 commented 3 years ago

I don't particularly care if sync* [...] is chosen as the exact syntax for iterable literals or not. The root problem here for me is that dart has a really great syntax for collection literals, but you can only use it to construct List, Set, and Map.

Just to throw out a crazy idea. Is there any chance that static metaprogramming #1482 could be applied to collection literals?

Imagine if you could have a macro which takes a collection literal, and transforms it into code which constructs some other type of collection. Users could then write macros to construct an Iterable, or an immutable list #117, or whatever other data structure they want.

munificent commented 3 years ago

Just to throw out a crazy idea. Is there any chance that static metaprogramming #1482 could be applied to collection literals?

I don't think this is a crazy idea. :) C# has something called collection initializers which essentially let you use collection-literal-like syntax as an initializer for your own class. I have occasionally pondered something similar for Dart.

Given how crowded Dart's expression grammar is, it would be really difficult to add without colliding with some other syntax, but I like the idea in general.

PaulRudin commented 2 years ago

But I didn't want to step on future tuple syntax:

var wat = (1, 2, 3); // Lazy iterable of three elements, or tuple?

Incidentally - in python it isn't the parentheses that create the tuple, it's the comma:

>>> x = 1, 2, 3
>>> x
(1, 2, 3)
>>> type(x)
<class 'tuple'>
mmcdon20 commented 2 years ago

But I didn't want to step on future tuple syntax:

var wat = (1, 2, 3); // Lazy iterable of three elements, or tuple?

Incidentally - in python it isn't the parentheses that create the tuple, it's the comma:

>>> x = 1, 2, 3
>>> x
(1, 2, 3)
>>> type(x)
<class 'tuple'>

@PaulRudin It would still be an issue if you are proposing the following syntax:

1, 2, 3 // tuple
(1, 2, 3) // lazy iterable

It's a problem since you can add parenthesis around any arbitrary expression.

This isn't an issue for python because python does not have iterable literals at all, only comprehensions that generate iterables.

# python
1, 2, 3 # tuple
(1, 2, 3) # also tuple
(item + 1 for item in range(3)) # generator comprehension (lazy iterable)