iter: new package for iterators

rsc commented 1 year ago

We propose to add a new package iter that defines helpful types for iterating over sequences. We expect these types will be used by other APIs to signal that they return iterable functions.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted.

Iterators

An iterator is a function that passes successive elements of a sequence to a callback function, conventionally named yield, stopping either when the sequence is finished or when yield breaks the sequence by returning false. This package defines [Seq] and [Seq2] (pronounced like seek - the first syllable of sequence) as shorthands for iterators that pass 1 or 2 values per sequence element to yield:

type (
    Seq[V any]     func(yield func(V) bool) bool
    Seq2[K, V any] func(yield func(K, V) bool) bool
)

Seq2 represents a sequence of paired values, conventionally key-value, index-value, or value-error pairs.

Yield returns true when the iterator should continue with the next element in the sequence, false if it should stop. The iterator returns true if it finished the sequence, false if it stopped early at yield's request. The iterator function's result is used when composing iterators, such as in [Concat]:

func Concat[V any](seqs ...Seq[V]) Seq[V] {
    return func(yield func(V) bool) bool {
        for _, seq := range seqs {
            if !seq(yield) {
                return false
            }
        }
        return true
    }
}

Iterator functions are most often called by a range loop, as in:

func PrintAll[V any](seq iter.Seq[V]) {
    for _, v := range seq {
        fmt.Println(v)
    }
}

Naming

Iterator functions and methods are named for the sequence being walked:

// All returns an iterator over elements in s.
func (s *Set[V]) All() iter.Seq[V]

The iterator method on a collection type is conventionally named All, as in the second example, because it iterates a sequence of all the values in the collection.

When there are multiple possible iteration orders, the method name may indicate that order:

// All iterates through the list from head to tail.
func (l *List[V]) All() iter.Seq[V]

// Backward iterates backward through the list from tail to head.
func (l *List[V]) Backward() iter.Seq[V]

If an iterator requires additional configuration, the constructor function can take additional configuration arguments:

// Bytes iterates through the indexes and bytes in the string s.
func Bytes(s string) iter.Seq2[int, byte]

// Split iterates through the (possibly-empty) substrings of s
// separated by sep.
func Split(s, sep string) iter.Seq[string]

Single-Use Iterators

Most iterators provide the ability to walk an entire sequence: when called, the iterator does any setup necessary to start the sequence, then calls yield on successive elements of the sequence, and then cleans up before returning. Calling the iterator again walks the sequence again.

Some iterators break that convention, providing the ability to walk a sequence only once. These “single-use iterators” typically report values from a data stream that cannot be rewound to start over. Calling the iterator again after stopping early may continue the stream, but calling it again after the sequence is finished will yield no values at all, immediately returning true. Doc comments for functions or methods that return single-use iterators should document this fact:

// Lines iterates through lines read from r.
// It returns a single-use iterator.
func (r *Reader) Lines() iter.Seq[string]

Errors

If iteration can fail, it is conventional to iterate value, error pairs:

// Lines iterates through the lines of the named file.
// Each line in the sequence is paired with a nil error.
// If an error is encountered, the final element of the
// sequence is an empty string paired with the error.
func Lines(file string) iter.Seq2[string, error]

Pulling Values

Functions and methods that are iterators or accept or return iterators should use the standard yield-based function signature, to ensure compatibility with range loops and with other iterator adapters. The standard iterators can be thought of as “push iterator”, which push values to the yield function.

Sometimes a range loop is not the most natural way to consume values of the sequence. In this case, [Pull] converts a standard push iterator to a “pull iterator”, which can be called to pull one value at a time from the sequence. [Pull] starts an iterator and returns a pair of functions next and stop, which return the next value from the iterator and stop it, respectively.

For example:

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
    return func(yield func(V, V) bool) bool {
        next, stop := iter.Pull(it)
        defer stop()
        v1, ok1 := next()
        v2, ok2 := next()
        for ok1 || ok2 {
            if !yield(v1, v2) {
                return false
            }
        }
        return true
    }
}

Clients must call stop if they do not read the sequence to completion, so that the iterator function can be allowed to finish. As shown in the example, the conventional way to ensure this is to use defer.

Other Packages

Many packages in the standard library provide iterator-based APIs. Here are some notable examples.

TODO FILL THIS IN AS OTHER PACKAGES ARE UPDATED

Mutation

Iterators only provide the values of the sequence, not any direct way to modify it. If an iterator wishes to provide a mechanism for modifying a sequence during iteration, the usual method is to define a position type with the extra operations and then provide an iterator over positions.

For example, a tree implementation might provide:

// Positions iterates through positions in the sequence.
func (t *Tree[V]) Positions() iter.Seq[*Pos]

// A Pos represents a position in the sequence.
// It is only valid during the yield call it is passed to.
type Pos[V any] struct { ... }

// Pos returns the value at the cursor.
func (p *Pos[V]) Value() V

// Delete deletes the value at this point in the iteration.
func (p *Pos[V]) Delete()

// Set changes the value v at the cursor.
func (p *Pos[V]) Set(v V)

And then a client could delete boring values from the tree using:

for p := range t.Positions() {
    if boring(p.Value()) {
        p.Delete()
    }
}

*/ package iter

// Seq is an iterator over sequences of individual values.
// See the [iter] package documentation for details.
type Seq[V any] func(yield func(V) bool) bool

// Seq2 is an iterator over pairs of values, conventionally
// key-value or value-error pairs.
// See the [iter] package documentation for details.
type Seq2[K, V any] func(yield func(K, V) bool) bool

Note: If and when generic type aliases are implemented (#46477), we might also want to add type Yield[V any] = func(V bool) and type Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument as yield iter.Yield[V].

// Pull starts the iterator in its own coroutine, returning accessor functions.
// Calling next returns the next value from the sequence. When there is
// such a value v, next returns v, true. When the sequence is over, next
// returns zero, false. Stop ends the iteration, allowing the iterator function
// to return. Callers that do not iterate to the end must call stop to let the
// function return. It is safe to call stop after next has returned false,
// and it is safe to call stop multiple times. Typically callers should defer stop().
func Pull[V any](seq Seq[V]) (next func() (V, bool), stop func())

// Pull2 starts the iterator in its own coroutine, returning accessor functions.
// Calling next returns the next key-value pair from the sequence. When there is
// such a pair k, v, next returns k, v, true. When the sequence is over, next
// returns zero, zero, false. Stop ends the iteration, allowing the iterator function
// to return. Callers that do not iterate to the end must call stop to let the
// function return. It is safe to call stop after next has returned false,
// and it is safe to call stop multiple times. Typically callers should defer stop().
func Pull2[K, V any](seq Seq[K, V]) (next func() (K, V, bool), stop func())

szabba commented 1 year ago

Note: If and when generic type aliases are implemented (https://github.com/golang/go/issues/46477), we might also want to add type Yield[V any] = func(V bool) and type Yield2[K, V any] = func(K, V) bool. That way, code writing a function signature to implement Seq or Seq2 can write the argument as yield iter.Yield[V].

I think that's supposed to say type Yield[V any] = func(V) bool?

icholy commented 1 year ago

In the "Pulling Values" example, I think next, stop := iter.Pull(it) should be next, stop := iter.Pull(seq)

mateusz834 commented 1 year ago

Isn't this kind of API more preferable? Consider a case when you need to pass the next and stop functions separately to an function, struct, it might be annoying.

type PullIter[T any] struct{}

func (*PullIter[T]) Stop()
func (*PullIter[T]) Next() (T, bool)

func Pull[V any](seq Seq[V]) PullIter[V]

gazerro commented 1 year ago

Based on:

Range expression                                   1st value          2nd value
function, 1 value   f  func(func(V)bool) bool      value    v  V

the range in the Concat and PrintAll functions should have only one value?

DmitriyMV commented 1 year ago

I believe

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
        ...
        next, stop := iter.Pull(it)
        ...
}

should be

// Pairs returns an iterator over successive pairs of values from seq.
func Pairs[V any](seq iter.Seq[V]) iter.Seq2[V, V] {
        ...
        next, stop := iter.Pull(seq)
        ...
}

jimmyfrasche commented 1 year ago

Meta: Most of the first post is about iterators in general. It's not obviously clear on a first reading what is actually going into the iter package. Also the formatting is a little rough.

jimmyfrasche commented 1 year ago

Given that

every Seq2 can be converted to a Seq by a left or right projection
every Seq can be extended to a Seq2 by padding it like Seq[K] → Seq2[K, struct{}] or an operation like python's enumerate

one way to avoid having F{,2} for each F would be to provide the extension/projection helpers in this package and let all the general operations be on Seq2 (possibly even naming that Seq and the other Seq1).

jimmyfrasche commented 1 year ago

I know having a separate function/method to call to get the error can lead to people forgetting to do so but I really don't see how Seq2[K, error] makes sense except in the case where each K could have an associated error. I get why it seems like it would be appealing but I don't think it's going to be nice in practice:

It's easy to ignore with k := range f.

It's not easy to not ignore without being awkward. Neither

var k Key
var err error
for k, err = range f {
  if err != nil {
    break
  }
  proc(k)
}
if err != nil {
  handle(err)
}

nor

for k, err := range f {
  if err != nil {
    handle(err)
    break
  }
  proc(k)
}

look especially readable to me.

DmitriyMV commented 1 year ago

@jimmyfrasche Map fits quite nicely for this for k, err := ... pattern. Working with buffers too. Parallel execution.

earthboundkid commented 1 year ago

one way to avoid having F{,2} for each F would be to provide the extension/projection helpers in this package and let all the general operations be on Seq2 (possibly even naming that Seq and the other Seq1).

I think the other way around, you should have a Seq2 → Seq[Pair[T1, T2]] mapping and make Seq the default most places.

Edit: I guess you should have both mappings 1 → 2 and 2 → 1, but I also think 1 will end up being the default for things like Filter and TakeWhile etc.

earthboundkid commented 1 year ago

Isn't this kind of API more preferable? …

I think in most cases pull iters will be used in place, not passed around, so a pair of funcs is better than an object with methods.

rsc commented 1 year ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

AndrewHarrisSPU commented 1 year ago

xiter isn't addressing error-flavored 2-ary forms. There has also been a significant amount of prior discussion about e.g. Next() (T, error) (Maybe this is the TLDR? https://github.com/golang/go/issues/61405#issuecomment-1664775997)

If I thought an iter.Stream type or an iter/stream package would be worth exploring, I'm wondering if we'd have a preference or any guidance about where to explore that - in the current set of proposals, in a separate proposal, or leave that unresolved for now (maybe forever), etc.?

djordje200179 commented 1 year ago

I think that method name for iterating through sequence shouldn't be All. Because my first guess when seeing that is that it is Python-like and C# LINQ-like method for checking if all elements meet the criteria. Also, its non-consistent to have methods All and Backwards. In that case name Forwards would be more appropriate.

Merovius commented 1 year ago

@djordje200179 IMO for _, v := range it.All() makes it clear, that this is not a predicate, but an iteration. And Forwards does not work as a conventional name, as not all data structures have a dedicated direction - see e.g. map.

leaxoy commented 1 year ago

Although the proposal is attractive, but to be honest, the Seq2[K, V] looks ugly and very similar to the Seq[V], every time when add iter-func, we must consider there are two to version of Seq, it's a bad thing. In the long run, two versions of iterators can quickly bloat the code.

Introduce tuple type for Seq2 maybe another choice, but the final decision rests with the go official team.

What do you think about this potential problem, @rsc.

kardianos commented 1 year ago

@rsc My apologies if this has already been addressed, but are iterators that could fail (network or disk functions) supported in these interfaces? Or would you just check for an error after the loop and assume the loop would stop either on the last one or error? For instance, as currently envisions, could we do something like this?

for set := range  rows.Sets() {
    for _, row := range rows.Set(set) {
        name := sql.Get[string](row, "Name")
    }
}
if err := rows.Err(); err != nil {
   return err
}

In this example, the rows breaks on the first error (be that in sql.Get, rows.Sets, or rows.Set) and is returned in the Err method.

Let me know if the above example wouldn't be possible, or if you envision a different way of handling this, of if they wouldn't be supported. Thank you.

earthboundkid commented 1 year ago

Re: errors, the proposal says:

If iteration can fail, it is conventional to iterate value, error pairs:

In practice, the .Err() pattern has lead to a lot of bugs where .Err() is omitted. The most glaring example is at https://github.com/features/copilot/ (click on write_sql.go and note the lack of .Err()). I think the existing .Err() iterators in the std library should probably just stick around because they already exist, but we want a new pattern moving forward.

Re: SQL, see https://github.com/golang/go/issues/61637.

bobg commented 1 year ago

Cosmetic suggestion: I like Of better than Seq as the type name. IMO Of is a good name for generic Go container types in general. Qualified with the package name and the element type it reads naturally: iter.Of[string] is an iterator of strings.

TrevinTeacutter commented 1 year ago

For cases where the iterator encounters an error that prevents continuing (ie pagination where the full set is never known until completely iterating through the pages, non-transient errors prevent continuation), it's not readily apparent to me where that would be bubbled up? Is it the iterators responsibility to ensure that one more value, error pair is yielded too? IMO that reads a little bit funny because it's not apparent to the yield function that it is a non-recoverable error. While I know this is all about combining with range over func, it seems to be me there are three choices and one of which I don't know that it would play well with range over func:

Expose the non-recoverable error to the yield function, which might cause confusion because it may not be clear the error caused the iterator to stop.
Expose an Err() error function anyway to call after iteration to check for an error that got bubbled up, can lead to the same issue as before where errors are missed.
Have the iterator itself return an error instead of a bool to allow bubbling up of errors. ~Side note on this one, type unions would be nice here because then you could return bool | error and not have to rely on a sentinel error value to be returned to indicate everything finished successfully.~

earthboundkid commented 1 year ago

For cases where the iterator encounters an error that prevents continuing (ie pagination where the full set is never known until completely iterating through the pages, non-transient errors prevent continuation), it's not readily apparent to me where that would be bubbled up?

for resource, err := range depaginate(iterateResources(ctx)) {
    if err != nil { // resource is a dummy value
        return err
    }
    // resource is good
}

You can also imagine helpers that turn an iter.Seq2[T, err] into a plain iter.Seq[T] that halts and stuffs the error into a pointer when encountered like

it2 := getSomething(ctx) // returns iter.Seq2[Resource, err]
var err error
it1 := captureError(it2, &err) // returns iter.Seq[Resource]
// do stuff with it1
if err != nil { /* it2 must have returned an error, halting it1 */ }

TrevinTeacutter commented 1 year ago

@carlmjohnson

This was the sort of thing I was concerned with the range over func proposal, people would start stuffing things in iterators where they don't really make sense, just to allow them to use range over func.

I much prefer reading this than what you proposed, I don't have the exact words to explain why. I guess it might have something to do with obfuscating what the original iterator (which is the important one) because I'm wrapping it just to capture a specific error.

iterateFunc := client.PaginateResources(ctx)
handler := func(r Resource, err error) bool {
  // Do stuff around the resources, handle potential recoverable errors
}

err := iterateFunc(handler)
if err != nil {
  // I can assume some unrecoverable error happened and handle that differently
}

joerdav commented 1 year ago

Cosmetic suggestion: I like Of better than Seq as the type name. IMO Of is a good name for generic Go container types in general. Qualified with the package name and the element type it reads naturally: iter.Of[string] is an iterator of strings.

@bobg

I'm not sure I agree with this. Package names and type identifiers don't tend to be designed to be read as sentences. Of doesn't describe well what the type is for on it's own, so could lead to confusion.

AndrewHarrisSPU commented 1 year ago

I much prefer reading this than what you proposed

I think I might as well - errors and contexts in light of concurrent rather than just eager versus lazy evaluation make me think type Stream[T] rather than Seq2[T, error] might be worth exploring. I'm not sure what I think, but the hypothetical: streams shouldn't be used to feed stuff like xiter stuff that shouldn't or can't be concerned with errors or contexts because there's a difference between concurrency and fail-proof lazy iteration. If a Stream type would be useful it'd have to cover many practical cases and be easier to use.

jimmyfrasche commented 1 year ago

@carlmjohnson

The language contains Seq2 iterators naturally so there would need to be a Seq2[K, V]→Seq[Pair[K, V]] adapter.

To use a Seq[Pair[K, V]] with range you'd either need to do

for p := range s {
  k, v := p.Items()

or have a Seq[Pair[K, V]]→Seq2[K, V] adapter.

If you wanted to drop the K or V you'd need to that last adapter paired with the range or you'd need projections that reach through and turn a Seq[Pair[K, V]] into a Seq[K] or a Seq[V].

It's unlikely you'd have a filter predicate that's usable with both a K and a Pair[K, V] so you're not saving much other than not having a Filter and a Filter2. If you had a predicate that took a Pair it would be like the range loop where you'd need to immediately expand the pair anyway so there's little savings over just having a Filter2 predicate and ignoring the argument you don't need.

If Go were a language with tuples there'd be a lot of good reasons to use that sort of formulation but it isn't so I don't see how that improves anything vs. always using Seq2 or having two functions for everything.

qualidafial commented 1 year ago

If the long term plan is to have iter.Seq be the blessed type alias for rangeable functions, then I would likewise prefer Seq() as the blessed method name for iterators instead of All()

icholy commented 1 year ago

Is there a reason Pull isn't a method on Seq?

func (s Seq[V]) Pull() (next func() (V, bool), stop func())

rogpeppe commented 1 year ago

for resource, err := range depaginate(iterateResources(ctx)) {

I'm somewhat concerned that it's a bit too easy to write the above as:

for resource := range depaginate(iterateResources(ctx)) {

and thereby miss checking the errors, but I guess that could be a vet check?

aarzilli commented 1 year ago

I'm curious about the TODO section about "other packages", particularly go/ast.Inspect and debug/dwarf.Reader both of which have a way to say "continue but skip the children of the current node" which doesn't have a direct translation to this API.

rogpeppe commented 1 year ago

The fact Pull returns two values means it's not as trivial to pass a pull-based iterator to another function or define a type that encapsulates such an iterator. It also makes the association between the two operations less obvious. It might be good to expand on some of the reasons for choosing two functions over an interface type.

Merovius commented 1 year ago

@icholy Many iterator function won't be Seq[V], but be methods or package-scoped functions. I don't think I prefer iter.Seq(f).Pull() over iter.Pull(f).

Merovius commented 1 year ago

@rogpeppe See the paragraph starting with "I’m unaware of any significant evidence in favor of a parallel set of pull-based APIs…"

icholy commented 1 year ago

@Merovius the way I'm reading the proposal, it looks like the convention will be to return iter.Seq[V] from methods, not have methods implement the iter.Seq[V] signature.

for range x.All() {
   // ...
}

as opposed to

for range x.All {
  // ...
}

Merovius commented 1 year ago

@icholy You are right. My bad. I guess I'm indifferent between making it a method or a function then.

ianlancetaylor commented 1 year ago

@rogpeppe

I'm somewhat concerned that it's a bit too easy to write the above as:

for resource := range depaginate(iterateResources(ctx)) {

and thereby miss checking the errors, but I guess that could be a vet check?

A vet check would be straightforward. Another option would be to require that when doing a range over a function that uses a yield function that takes two arguments, people are required to always use two variables (unlike a range over a map or a slice).

earthboundkid commented 1 year ago

It might be nice to have FromPull:

package iter

// FromPull creates a Seq[T] from an arbitrary "pull" based iterator.
// If stop is not nil, stop will be called on the closing of the returned Seq[T].
func FromPull[T any](next func() (T, bool), stop func()) Seq[T] {
    return func(yield func(T) bool) {
        defer func() {
            if stop != nil {
                stop()
            }
        }()
        for {
            v, ok := next()
            if !ok { return }
            if !yield(v) { return }
        }
    }
}

jimmyfrasche commented 1 year ago

@inalancetaylor

A vet check would be straightforward. Another option would be to require that when doing a range over a function that uses a yield function that takes two arguments, people are required to always use two variables (unlike a range over a map or a slice).

Even if you require both arguments you could still do for x, _ := range s so you'd still need the vet check.

I'm still not sure about Seq2[T, error] in general. It seems like it would make the code consuming the sequence awkward: https://github.com/golang/go/issues/61897#issuecomment-1671924099 (If the error is paired the other value being iterated and not an end of sequence indicator that seems fine, but most people seem to be talking about the latter)

icholy commented 1 year ago

@jimmyfrasche

Even if you require both arguments you could still do for x, _ := range s so you'd still need the vet check.

The same is true for any function that returns an error.

Merovius commented 1 year ago

@jimmyfrasche for x, _ := range s counts as "I explicitly ignore the error", which should pacify the vet check, IMO. Though also, FTR, I'm unhappy with having to write something like _ = f() to satisfy a vet check (which is why I categorically disable the errcheck lint that exists in a bunch of static linting tools).

willfaught commented 1 year ago

Package iter provides basic definitions and operations related to iterators over sequences.

An iterator is a function that passes successive elements of a sequence to a callback function

So, we have a sequence, and an iterator for that sequence. A sequence is an abstract concept. So I don't understand why we would call iterators themselves sequences (Seq). It seems like they should be called Iterator, or perhaps Iter.

This package defines [Seq] and [Seq2] (pronounced like seek - the first syllable of sequence) as shorthands for iterators that pass 1 or 2 values per sequence element to yield

I'm not a fan of the number suffix, although I do appreciate its pragmatism. It seems easy to miss in a jumbled function signature. What about Seq and SeqPair?

The standard iterators can be thought of as “push iterator”

Should this be "push iterators"?

func (t Tree[V]) Positions() iter.Seq[Pos]

Should Pos be Pos[V]?

Iterators only provide the values of the sequence, not any direct way to modify it. If an iterator wishes to provide a mechanism for modifying a sequence during iteration, the usual method is to define a position type with the extra operations and then provide an iterator over positions.

What do Seq/Seq2 say about mutability in general? If I create a pull iterator, then delete the first item outside of the iterator, then get the first item from the iterator, is that valid? What does it return? Should there be a convention about the default/All iterator not being safe with mutation unless otherwise specified in the documentation?

the range in the Concat and PrintAll functions should have only one value?

Just the PrintAll function. Concat ranges over a slice.

one way to avoid having F{,2} for each F would be to provide the extension/projection helpers in this package and let all the general operations be on Seq2 (possibly even naming that Seq and the other Seq1).

Although the proposal is attractive, but to be honest, the Seq2[K, V] looks ugly and very similar to the Seq[V], every time when add iter-func, we must consider there are two to version of Seq, it's a bad thing. In the long run, two versions of iterators can quickly bloat the code.

I'm concerned about having two types/names, and having to adapt from one to the other. It seems better to focus on one. A single type variable can be parameterized for all needs, 1 value, 2 values, 3 values, etc. This way seems simpler and more expressive to me.

Even if adding tuples to Go is off the table for some reason, we could just declare a Pair type in iter.

It's not easy to not ignore without being awkward

This assumes that an error signals that the rest of the sequence is invalid, but that's not necessarily the case. Each item could be individually retrieved, and a failure for one item could be independent of the others. So you could have:

for x, err := range y {
  if err != nil {
    handle(err)
    continue
  }
  // ...
}

I think that method name for iterating through sequence shouldn't be All. Because my first guess when seeing that is that it is Python-like and C# LINQ-like method for checking if all elements meet the criteria. Also, its non-consistent to have methods All and Backwards. In that case name Forwards would be more appropriate.

I agree that All is awkward. It has the boolean connotation, but it also seems...redundant, I guess? As the default iterator for a type, when would it not produce all values? I guess if there's a Random or FirstHalf iterator too where that typically makes more sense, somehow?

The package is named iter, so how about Iterator or Iter?

I also agree that in that example, All should just be called Forwards, unless it's important to conform to an iterable interface of some kind, which doesn't seem to be a concern according to this proposal.

I don't think I prefer iter.Seq(f).Pull() over iter.Pull(f).

I would, if I had to import iter just to do that.

See the paragraph starting with "I’m unaware of any significant evidence in favor of a parallel set of pull-based APIs…"

I don't think he was arguing for parallel pull APIs, just a different API for pull iterators themselves.

Merovius commented 1 year ago

@willfaught

I don't think he was arguing for parallel pull APIs, just a different API for pull iterators themselves.

I don't understand what you mean. There is an API to convert a push-iterator into a pull operator. The paragraph I was referring to advocates for a pattern of composing only push-iterators and only converting them to a pull-operator at the end of the pipeline. So I don't understand what "a different API for pull iterators themselves" would mean, if not the thing that paragraph makes a case against.

BooleanCat commented 1 year ago

To share how this would plug into a package doing something similar (https://github.com/BooleanCat/go-functional): it would make iteration via for loops easier but the notion of enshrining the number of return values into type names (Seq/Seq2) seems clunky - another option is to only define Seq and expect that people will define structs with as many fields as they like as iteration members (as I've done with Option[T] and Result[T] in go-functional).

Concretely I could improve go-functional with this proposal in two ways, firstly iteration over members:

lines := iter.Exclude(iter.LinesString(file), filters.IsZero)

for line := lines.Next(); line.IsSome(); line = lines.Next() {
    fmt.Println(line.Unwrap())
}

Could become:

lines := iter.Exclude(iter.LinesString(file), filters.IsZero)

for line := range lines.Seq() {  // or whatever
    fmt.Println(line.Unwrap())
}

Secondaly, I could delete a bunch of pre-defined iterators in my "standard library" by accepting a Seq into an iterator:

lines := iter.New(seq)  // where `iter` is go-functional/iter

It might be worth exploring ways to "entirely consume" iterators within the standard library. From using Rust's iterators frequently I often find myself doing the Rust equivalent of:

Collecting iterator members into a slice
Map / reduce
Execute a callback on each member yielded (for_each)

AndrewHarrisSPU commented 1 year ago

Poking around with Seq2[T, error] a bit, I'm thinking the universe of code that works with Seq2[T1, T2] types is disjoint, and that's OK. For example, from xiter func Merge[K cmp.Ordered, V any](x, y Seq2[K, V]) Seq2[K, V] accepts a Seq2, but it's more sharply constrained to accept less than the entire universe of Seq2. Similarly, func F[T any](seq Seq2[T, error]) is statically sharper. There are places where Seq2[T, error] makes sense and doesn't involve superfluous ceremony.

I believe I'd still rather not export or import Seq2[T, error] directly in a lot of cases. While Seq2[T, error] expresses enough to match some functional idioms, I'm still thinking it doesn't express enough to match Go idioms, to match APIs like bufio.Scanner or sql.DB.QueryContext. Push iteration and for-range does a lot here, but what's made implicit in control flow isn't every bit of state that might be useful in these cases. Some considerations that might require more state:

Specified behavior about encountering an unhandled error (or just ways to ask for a specific behavior, like TryAll() -> Seq[T] rather than Seq2[T, error] with commonly understood not-happy-path behavior)
A way to observe an error after e.g. exiting a for-range loop
A way to cancel outstanding work, like context.WithCancel
Coherency w/r/t conversion to a pull iterator

I definitely don't think this proposal needs to have an answer on this, just that it's actively interesting in light of this proposal.

eliasnaur commented 1 year ago

I'm concerned that Pull/Pull2 are over-fitted for iterators that need cleanup. The stop func return value and the requirement to call it are only necessary for those heavy-weight iterators, whereas memory-only iterators don't need either.

In other words, I'd love to see pull iterators be as light weight as the proposed timer.Ticker.Stop. Then, it's up to the iterator implementation to require calling Close/Stop after iteration ends.

Also related is the long standing issue whether to garbage collect goroutines that never wakes up. In particular, removing the stop return value or the requirement to call it probably implies runtime changes to garbage collect ongoing Pull iterators.

earthboundkid commented 1 year ago

Garbage collecting goroutines has been discussed a couple of times and rejected. The first one I found is #41414, but it has come up in some other places too. The problem is you're terminating execution in a spot where it doesn't expect to be killed. That should be a panic. But if nothing catches the panic, the whole program will go down, which is what is happening anyway. It's a lot of work for no improvements.

aarzilli commented 1 year ago

Garbage collecting goroutines has been discussed a couple of times and rejected

Coroutines are different however, you can tell easily that they won't resume (if resume and cancel are not reachable) and there is a way to stop them without injecting a panic (by calling cancel).

djordje200179 commented 1 year ago

Regarding naming of method for iterating through collection. Proposal says: The iterator method on a collection type is conventionally named All, as in the second example, because it iterates a sequence of all the values in the collection.

Wouldn't it be more natural to name that method Stream.

So it would be used like:

for key, value := range hashmap.Stream {
    ...
}

IMO this method name better represents intention of what is trying to be done. If someone wants to iterate in reverse then he could write method ReverseStream or StreamReverse . It is more consistent than naming All and Backwards.

Merovius commented 1 year ago

@eliasnaur Pull is only necessary for pull-iterators that need cleanup. That is, it is the conversion from push-iterator to pull-iterator that necessitates the cleanup. Pull-iterators that don't need cleanup can just be spelled func() (T, bool), no call to Pull necessary.

eliasnaur commented 1 year ago

@eliasnaur Pull is only necessary for pull-iterators that need cleanup. That is, it is the conversion from push-iterator to pull-iterator that necessitates the cleanup.

In the example of a push iterator over a collection of in-memory values, no cleanup inherent to the iterator is required. Only when wrapping the iterator in iter.Pull the burden of calling stop is added; again by no fault of the original push iterator. It's my understanding that if you don't call stop, the Go runtime will leak the goroutine memory created by iter.Pull. That's surprising when memory is otherwise automatically collected in Go.

In other words, my concerns is the wording "Callers that do not iterate to the end must call stop to let the function return.", because it sounds like an implementation restriction similar to the time.Ticker.Stop restriction proposed lifted in #61542. To be clear: for a particular push iterator it may be an error to not call stop returned from Pull, precisely because the iterator needs to clean up some external resource. However, iter.Pull itself shouldn't impose the requirement.

In conclusion, I'd like to be able to

next, _ := iter.Pull(inMemoryIterator)
// use next

or even

next := iter.SimplePull(inMemoryIterator)

however that's spelled.

Pull-iterators that don't need cleanup can just be spelled func() (T, bool), no call to Pull necessary.

I may not be the author of the iterator, so I can't in general control its spelling.

eliasnaur commented 1 year ago

FWIW, my use-case is to model GUI components as push iterators, whereby using them with the proposed iter.Pull would add the defer stop() burden to every component in the user interface.

golang / go