golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
120.67k stars 17.33k forks source link

proposal: x/exp/xiter: new package with iterator adapters #61898

Open rsc opened 10 months ago

rsc commented 10 months ago

We propose to add a new package golang.org/x/exp/xiter that defines adapters on iterators. Perhaps these would one day be moved to the iter package or perhaps not. There are concerns about how these would affect idiomatic Go code. It seems worth defining them in x/exp to help that discussion along, and then we can decide whether they move anywhere else when we have more experience with them.

The package is called xiter to avoid a collision with the standard library iter (see proposal #61897). An alternative would be to have xiter define wrappers and type aliases for all the functions and types in the standard iter package, but the type aliases would depend on #46477, which is not yet implemented.

This is one of a collection of proposals updating the standard library for the new 'range over function' feature (#61405). It would only be accepted if that proposal is accepted. See #61897 for a list of related proposals.

Edit, 2024-05-15: Added some missing 2s in function names, and also changed Reduce to take the function first, instead of between sum and seq.


/* Package xiter implements basic adapters for composing iterator sequences:

*/ package xiter

Ideally we would define these:

type Seq[V any] = iter.Seq[V]
type Seq2[K, V any] = iter.Seq2[K, V]

but we may not be able to. If not, the definitions below would refer to iter.Seq and iter.Seq2 instead of Seq and Seq2.

// Concat returns an iterator over the concatenation of the sequences.
func Concat[V any](seqs ...Seq[V]) Seq[V] {
    return func(yield func(V) bool) bool {
        for _, seq := range seqs {
            if !seq(yield) {
                return false
            }
        }
        return true
    }
}
// Concat2 returns an iterator over the concatenation of the sequences.
func Concat2[K, V any](seqs ...Seq2[K, V]) Seq[K, V] {
    return func(yield func(K, V) bool) bool {
        for _, seq := range seqs {
            if !seq(yield) {
                return false
            }
        }
        return true
    }
}
// Equal reports whether the two sequences are equal.
func Equal[V comparable](x, y Seq[V]) bool {
    for z := range Zip(x, y) {
        if z.Ok1 != z.Ok2 || z.V1 != z.V2 {
            return false
        }
    }
    return true
}
// Equal2 reports whether the two sequences are equal.
func Equal2[K, V comparable](x, y Seq2[K, V]) bool {
    for z := range Zip2(x, y) {
        if z.Ok1 != z.Ok2 || z.K1 != z.K2 || z.V1 != z.V2 {
            return false
        }
    }
    return true
}
// EqualFunc reports whether the two sequences are equal according to the function f.
func EqualFunc[V1, V2 any](x Seq[V1], y Seq[V2], f func(V1, V2) bool) bool {
    for z := range Zip(x, y) {
        if z.Ok1 != z.Ok2 || !f(z.V1, z.V2) {
            return false
        }
    }
    return true
}
// EqualFunc2 reports whether the two sequences are equal according to the function f.
func EqualFunc2[K1, V1, K2, V2 any](x Seq[K1, V1], y Seq[K2, V2], f func(K1, V1, K2, V2) bool) bool {
    for z := range Zip(x, y) {
        if z.Ok1 != z.Ok2 || !f(z.K1, z.V1, z.K2, z.V2) {
            return false
        }
    }
    return true
}
// Filter returns an iterator over seq that only includes
// the values v for which f(v) is true.
func Filter[V any](f func(V) bool, seq Seq[V]) Seq[V] {
    return func(yield func(V2) bool) bool {
        for v := range seq {
            if f(v) && !yield(v) {
                return false
            }
        }
        return true
    }
}
// Filter2 returns an iterator over seq that only includes
// the pairs k, v for which f(k, v) is true.
func Filter2[K, V any](f func(K, V) bool, seq Seq2[K, V]) Seq[K, V] {
    return func(yield func(K, V) bool) bool {
        for k, v := range seq {
            if f(k, v) && !yield(k, v) {
                return false
            }
        }
        return true
    }
}
// Limit returns an iterator over seq that stops after n values.
func Limit[V any](seq Seq[V], n int) Seq[V] {
    return func(yield func(V) bool) bool {
        if n <= 0 {
            return true
        }
        for v := range seq {
            if !yield(v) {
                return false
            }
            if n--; n <= 0 {
                break
            }
        }
        return true
    }
}
// Limit2 returns an iterator over seq that stops after n key-value pairs.
func Limit2[K, V any](seq Seq2[K, V], n int) Seq2[K, V] {
    return func(yield func(K, V) bool) bool {
        if n <= 0 {
            return true
        }
        for k, v := range seq {
            if !yield(k, v) {
                return false
            }
            if n--; n <= 0 {
                break
            }
        }
        return true
    }
}
// Map returns an iterator over f applied to seq.
func Map[In, Out any](f func(In) Out, seq Seq[In]) Seq[Out] {
    return func(yield func(Out) bool) bool {
        for in := range seq {
            if !yield(f(in)) {
                return false
            }
        }
        return true
    }
}
// Map2 returns an iterator over f applied to seq.
func Map2[KIn, VIn, KOut, VOut any](f func(KIn, VIn) (KOut, VOut), seq Seq[KIn, VIn]) Seq[KOut, VOut] {
    return func(yield func(KOut, VOut) bool) bool {
        for k, v := range seq {
            if !yield(f(k, v)) {
                return false
            }
        }
        return true
    }
}
// Merge merges two sequences of ordered values.
// Values appear in the output once for each time they appear in x
// and once for each time they appear in y.
// If the two input sequences are not ordered,
// the output sequence will not be ordered,
// but it will still contain every value from x and y exactly once.
//
// Merge is equivalent to calling MergeFunc with cmp.Compare[V]
// as the ordering function.
func Merge[V cmp.Ordered](x, y Seq[V]) Seq[V] {
    return MergeFunc(x, y, cmp.Compare[V])
}
// MergeFunc merges two sequences of values ordered by the function f.
// Values appear in the output once for each time they appear in x
// and once for each time they appear in y.
// When equal values appear in both sequences,
// the output contains the values from x before the values from y.
// If the two input sequences are not ordered by f,
// the output sequence will not be ordered by f,
// but it will still contain every value from x and y exactly once.
func MergeFunc[V any](x, y Seq[V], f func(V, V) int) Seq[V] {
    return func(yield func(V) bool) bool {
        next, stop := Pull(y)
        defer stop()
        v2, ok2 := next()
        for v1 := range x {
            for ok2 && f(v1, v2) > 0 {
                if !yield(v2) {
                    return false
                }
                v2, ok2 = next()
            }
            if !yield(v1) {
                return false
            }
        }
        for ok2 {
            if !yield(v2) {
                return false
            }
            v2, ok2 = next()
        }
    }
}
// Merge2 merges two sequences of key-value pairs ordered by their keys.
// Pairs appear in the output once for each time they appear in x
// and once for each time they appear in y.
// If the two input sequences are not ordered by their keys,
// the output sequence will not be ordered by its keys,
// but it will still contain every pair from x and y exactly once.
//
// Merge2 is equivalent to calling MergeFunc2 with cmp.Compare[K]
// as the ordering function.
func Merge2[K cmp.Ordered, V any](x, y Seq2[K, V]) Seq2[K, V] {
    return MergeFunc2(x, y, cmp.Compare[K])
}
// MergeFunc2 merges two sequences of key-value pairs ordered by the function f.
// Pairs appear in the output once for each time they appear in x
// and once for each time they appear in y.
// When pairs with equal keys appear in both sequences,
// the output contains the pairs from x before the pairs from y.
// If the two input sequences are not ordered by f,
// the output sequence will not be ordered by f,
// but it will still contain every pair from x and y exactly once.
func MergeFunc2[K, V any](x, y Seq2[K, V], f func(K, K) int) Seq2[K, V] {
    return func(yield func(K, V) bool) bool {
        next, stop := Pull2(y)
        defer stop()
        k2, v2, ok2 := next()
        for k1, v1 := range x {
            for ok2 && f(k1, k2) > 0 {
                if !yield(k2, v2) {
                    return false
                }
                k2, v2, ok2 = next()
            }
            if !yield(k1, v1) {
                return false
            }
        }
        for ok2 {
            if !yield(k2, v2) {
                return false
            }
            k2, v2, ok2 = next()
        }
    }
}
// Reduce combines the values in seq using f.
// For each value v in seq, it updates sum = f(sum, v)
// and then returns the final sum.
// For example, if iterating over seq yields v1, v2, v3,
// Reduce returns f(f(f(sum, v1), v2), v3).
func Reduce[Sum, V any](f func(Sum, V) Sum, sum Sum, seq Seq[V]) Sum {
    for _, v := range seq {
        sum = f(sum, v)
    }
    return sum
}
// Reduce2 combines the values in seq using f.
// For each pair k, v in seq, it updates sum = f(sum, k, v)
// and then returns the final sum.
// For example, if iterating over seq yields (k1, v1), (k2, v2), (k3, v3)
// Reduce returns f(f(f(sum, k1, v1), k2, v2), k3, v3).
func Reduce2[Sum, K, V any](f func(Sum, K, V) Sum, sum Sum, seq Seq2[K, V]) Sum {
    for k, v := range seq {
        sum = f(sum, k, v)
    }
    return sum
}
// A Zipped is a pair of zipped values, one of which may be missing,
// drawn from two different sequences.
type Zipped[V1, V2 any] struct {
    V1  V1
    Ok1 bool // whether V1 is present (if not, it will be zero)
    V2  V2
    Ok2 bool // whether V2 is present (if not, it will be zero)
}
// Zip returns an iterator that iterates x and y in parallel,
// yielding Zipped values of successive elements of x and y.
// If one sequence ends before the other, the iteration continues
// with Zipped values in which either Ok1 or Ok2 is false,
// depending on which sequence ended first.
//
// Zip is a useful building block for adapters that process
// pairs of sequences. For example, Equal can be defined as:
//
//  func Equal[V comparable](x, y Seq[V]) bool {
//      for z := range Zip(x, y) {
//          if z.Ok1 != z.Ok2 || z.V1 != z.V2 {
//              return false
//          }
//      }
//      return true
//  }
func Zip[V1, V2 any](x Seq[V1], y Seq1[V2]) Seq[Zipped[V1, V2]] {
    return func(yield func(z Zipped[V1, V2]) bool) bool {
        next, stop := Pull(seq2)
        defer stop()
        v2, ok2 := next()
        for v1 := range seq1 {
            if !yield(Zipped[V1, V2]{v1, true, v2, ok2}) {
                return false
            }
            v2, ok2 = next()
        }
        var zv1 V1
        for ok2 {
            if !yield(Zipped[V1, V2]{zv1, false, v2, ok2}) {
                return false
            }
            v2, ok2 = next()
        }
    }
    return true
}
// A Zipped2 is a pair of zipped key-value pairs,
// one of which may be missing, drawn from two different sequences.
type Zipped2[K1, V1, K2, V2 any] struct {
    K1  K1
    V1  V1
    Ok1 bool // whether K1, V1 are present (if not, they will be zero)
    K2  K2
    V2  V2
    Ok2 bool // whether K2, V2 are present (if not, they will be zero)
}
// Zip2 returns an iterator that iterates x and y in parallel,
// yielding Zipped2 values of successive elements of x and y.
// If one sequence ends before the other, the iteration continues
// with Zipped2 values in which either Ok1 or Ok2 is false,
// depending on which sequence ended first.
//
// Zip2 is a useful building block for adapters that process
// pairs of sequences. For example, Equal2 can be defined as:
//
//  func Equal2[K, V comparable](x, y Seq2[K, V]) bool {
//      for z := range Zip2(x, y) {
//          if z.Ok1 != z.Ok2 || z.K1 != z.K2 || z.V1 != z.V2 {
//              return false
//          }
//      }
//      return true
//  }
func Zip2[K1, V1, K2, V2 any](x Seq[K1, V1], y Seq2[K2, V2]) Seq[Zipped2[K1, V1, K2, V2]] {
    return func(yield func(z Zipped2[K1, V1, K2, V2]) bool) bool {
        next, stop := Pull2(y)
        defer stop()
        k2, v2, ok2 := next()
        for k1, v1 := range x {
            if !yield(Zipped2[K1, V1, K2, V2]{k1, v1, true, k2, v2, ok2}) {
                return false
            }
            k2, v2, ok2 = next()
        }
        var zk1 K1
        var zv1 V1
        for ok2 {
            if !yield(Zipped2[V1, V2]{zk1, zv1, false, v2, ok2}) {
                return false
            }
            k2, v2, ok2 = next()
        }
    }
    return true
}
gophun commented 10 months ago

The duplication of each function is the first thing that catches the eye. Are there thoughts on why this is acceptable?

gophun commented 10 months ago

What about an adapter that converts an iter.Seq[V] to an iter.Seq2[int, V] and an adapter that converts an iter.Seq2[K, V] to an iter.Seq[V]?

zephyrtronium commented 10 months ago

Some typos: EqualFunc2, Map2, Merge2, and MergeFunc2 lack the 2 suffixes on their actual names. They're all correct in the corresponding documentation.

earthboundkid commented 10 months ago

May I humbly suggest that the name "iterutils" is less susceptible to, uh, unfortunate mispronunciation.

earthboundkid commented 10 months ago

For Reduce, the callback should go last: func Reduce[Sum, V any](sum Sum, seq Seq[V], f func(Sum, V) Sum) Sum.

DeedleFake commented 10 months ago

For Reduce, the callback should go last: func Reduce[Sum, V any](sum Sum, seq Seq[V], f func(Sum, V) Sum) Sum.

I'd actually prefer func Reduce[Sum, V any](seq Seq[V], sum Sum, f func(Sum, V) Sum) Sum.

Edit: I just realized that if Reduce() is being used to build an array, putting sum first puts everything in the same order as Append() and other functions that put the destination first. I'm not sure if that's worth it or not.

rsc commented 10 months ago

This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. — rsc for the proposal review group

DeedleFake commented 10 months ago

The more I think about it, the more that I think that API design for this should wait until after a decision is made on #49085. Multiple other languages have proven over and over that a left-to-right chained syntax is vastly superior ergonomically to simple top-level functions for iterators. For example, compare

nonNegative := xiter.Filter(
  xiter.Map(
    bufio.Lines(r),
    parseLine,
  ),
  func(v int) bool { return v >= 0 },
)

vs.

nonNegative := bufio.Lines(r).
  Map(parseLine).
  Filter(func(v int) bool { return v >= 0 })

Go's a little weird because of the need to put the .on the previous line, but other than that one oddity, which I could get used to, the second is better in every way. It reads in the order that actions actually happen, it's less repetitive, etc. The only real way to emulate it currently is something like

lines := bufio.Lines(r)
intlines := xiter.Map(lines, parseLine)
nonNegative := xiter.Filter(func(v int) bool { return v >= 0 })

That works, but it clutters up the local namespace and it's significantly harder to edit. For example, if you decide you need to add a new step in the chain, you have to make sure that all of the variables for each iterator match up in the previous and succeeding calls.

ianlancetaylor commented 10 months ago

What type does bufio.Lines return to make that work in Go? What methods does that type support? What is the type of nonNegative? I mean these as honest questions. Can we write this kind of code in Go today, or would we need new language features?

hherman1 commented 10 months ago

You would probably have to wrap the base iterator like:

stream.New(bufio.Lines).
    Filter(…).
    …
DeedleFake commented 10 months ago

@ianlancetaylor

Sorry. I should have stuck a comment in. I was just coming up with some hypothetical function that would give an iter.Seq[string]. In this case, the idea was that it would internally use a bufio.Scanner to yield lines from an io.Reader or something. My original code had an anonymous func(string) int instead of the vague parseLine but I removed it because it was clogging up the example with irrelevant code and I didn't clarify when I did.

@hherman1

Not necessarily. The transformative and sink functions on iterators could just be defined as methods on iter.Seq.

hherman1 commented 10 months ago

~But iter.Seq is an interface type no? Are you saying it should be a struct type?~

I was wrong, it’s not an interface.

benhoyt commented 10 months ago

Why do some functions take the f func as the last parameter, but Filter and Map take it as the first, and Reduce in the middle? Most other functions in the stdlib take funcs as the last parameter, such as sort.Slice, slices.*Func, ServeMux.HandleFunc, and so on. This makes code that uses them with inline function literals more readable:

names := xiter.Map(func (p Person) string {
    return p.Name
}, people) // "people" gets lost

// vs

names := xiter.Map(people, func (p Person) string {
    return p.Name
})
Merovius commented 10 months ago

@DeedleFake There won't be a "decision" on #49085 anytime soon. There are good reasons not to do it yet, but we also don't want to say it never happens. The issue exists to reflect that state. What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"?

DeedleFake commented 10 months ago

What it comes down to is, would you rather have no iterators (for the foreseeable future) or ones which can't be "chained"?

No iterators, definitely. I've done fine without them for over a decade. I can wait a bit longer. If a bad implementation goes in, I'll never get a good version. Plus, I can just write my own implementation of whatever iterator functions I need as long as range-over-func exists while I wait.

gophun commented 10 months ago

Neither chaining nor functional programming has ever been a decisive or recommended technique in Go. Instead, iteration—specifically, procedural 'for' loops—has always been a core technique since the language's inception. The iterator proposals aim to enhance this core approach. While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Merovius commented 10 months ago

I can wait a bit longer. If a bad implementation goes in, I'll never get a good version.

I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do #49085 in a decade or so.

DeedleFake commented 10 months ago

While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for Map(), Filter(), and Reduce(), among others? I have no problem with functions like slices.Backwards() and other source function proposals. My only problem is the transformative and sink functions.

I think "a bit" is misleading. We are talking years - if at all. And I don't believe the second part of that sentence is true either, we could always release a v2 of the relevant packages, if we ever manage to do https://github.com/golang/go/issues/49085 in a decade or so.

Edit: The way this proposal is phrased does actually imply that they may be heavily reevaluated enough in x/exp that they may not go into the standard library at all, so maybe my point is moot after all. I still think that this is a valid issue with the API design to bring up, but maybe it's a bit off-topic for this particular proposal and should wait until after they're in x/exp and it can be more easily demonstrated how awkward they are to use. I don't like the idea that existing code will be broken when some variant of them does potentially go into the standard library, but it's less of a problem than I was worried about. Never mind. Please ignore my rambling below.

That issue has only been open for 2 years. I think assuming that it'll take a decade to solve is a bit unfair. Yes, a v2 is an option, especially if #61716 is accepted, but that was created out of necessity to deal with problems in an existing package, while this would essentially be purposefully putting problems into a new package. It's not like I'm saying that iterators are unacceptable to me in this state, just that features have been delayed or cut before because of possible changes coming later and that I think that it's prudent to discuss the possibility here. That just happened in the last few weeks in the maps package because of the possibility of the acceptance of #61405. I think the same should be done with the transformative and sink functions for now, or at the very least those functions should be planned to stay in x/exp until some way to clean up the API is decided on, that's all.

One of my favorite things about Go is how slow and methodical it (usually) is in introducing new features. I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics. One of the purposes of that approach is to try avoid having to fix it later. Adding those functions in the proposed manner will almost definitely necessitate that later fix, and I very much would like to avoid that if at all possible.

gophun commented 10 months ago

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those?

Java Streams and .NET LINQ build on a standardized iterator system, but they are more than that. Both languages had a generic iterator system before. Iterators are useful without chaining or functional programming.

Why else is there a proposal here for Map(), Filter(), and Reduce(), among others?

That would be this very proposal, and it comes with a caveat: "... or perhaps not. There are concerns about how these would affect idiomatic Go code. "

This means that not everyone who has read these proposals in advance believes that this part is a good idea.

jba commented 10 months ago

While I don't know what the overall plans are, if you're hoping for Go to follow the path of Java Streams or C# LINQ, you might be in for disappointment.

Is that not the intention of these proposals? To build a standardized iterator system that works similarly to those? Why else is there a proposal here for Map(), Filter(), and Reduce(), among others?

Maybe chaining leads to too much of a good thing. It becomes more tempting to write long, hard-to-read chains of functions. You're less likely to do that if you have to nest calls.

As an analogy, Go has if. Isn't the intention of if to allow conditional execution? Why then shouldn't Go have the ternary operator ?:? Because it often leads to hard-to-read code.

rsc commented 10 months ago

Re #49085, generic methods either require (A) dynamic code generation or (B) terrible speed or (C) hiding those methods from dynamic interface checks or (D) not doing them at all. We have chosen option (D). The issue remains open like so many suggestions people have made, but I don't see a realistic path forward where we choose A, B, or C, nor do I see a fifth option. So it makes sense to assume generic methods are not going to happen and do our work accordingly.

Merovius commented 10 months ago

@DeedleFake The issue is not lack of understanding what a lack of parameterized methods means. It's just that, as @rsc said, wanting them doesn't make them feasible. The issue only being 2 years old is deceptive. The underlying problem is actually as old as Go and one of the main reasons we didn't have generics for most of that. Which you should consider, when you say

I think that the fact that it took over a decade to add generics is a good thing, and I really wanted generics.

We got generics by committing to keep implementation strategies open, thus avoiding the generics dilemma. Not having parametric methods is a pretty direct consequence of that decision.

DeedleFake commented 10 months ago

Well, I tried. If that's the decision then that's the decision. I'm disappointed, but I guess I'll just be satisfied with what I do like about the current proposal, even if it has, in my opinion, some fairly major problems. Sorry for dragging this a bit off-topic there.

thediveo commented 10 months ago

Hope that it's not noise: I wondered if naming it the sum parameter might be implying to the less experienced dev that reduce does only summation, so I looked at Javascript's array reduce: that uses accumulator. I don't know if that is much better, I just wanted to point it out. If anything, let's have a good laugh.

jimmyfrasche commented 10 months ago

Those nonstandard Zip definitions look like they would occasionally be useful but I think I'd want the ordinary zip/zipLongest definitions most of the time. Those can be recovered from the proposed with some postprocessing but I'd hate to have to always do that.

These should be considered along with Limit:

LimitFunc - stop iterating after a predicate matches (often called TakeWhile in other languages)

Skip, SkipFunc - drop the first n items (or until the predicate matches) before yielding (opposite of Limit/LimitFunc, often called drop/dropWhile)

jba commented 10 months ago

Those nonstandard Zip definitions look like they would occasionally be useful but I think I'd want the ordinary zip/zipLongest definitions most of the time.

Can you explain the difference? Is it just that zip typically stops at the end of the shorter sequence? That is definitely less useful as a building block, and easy to write given these functions. What are some examples where stopping at the shortest is better?

jimmyfrasche commented 10 months ago

zip stops after the shorter sequence. zipLongest pads out the missing values of the shorter sequence with a specified value.

The provided ones are more general and can be used to build those but I can't really think of any time I've used zip where I needed to know that. I've always either known the lengths were equal by construction so it didn't matter or couldn't do anything other than drop the excess so it didn't matter. Maybe that's peculiar to me and the situations in which I reach for zip, but they've been defined like that in every language I can think I've used which has to be some kind of indicator that I'm not alone in this.

I'm not arguing for them to be replaced with the less general more common versions: I want those versions here too so I can use them directly without having to write a shim to the standard definition.

jimmyfrasche commented 10 months ago

Ideally I'd like something like this

func Zip[V1, V2 any](x Seq[V1], y Seq1[V2]) Seq2[V1, V2]
func ZipLongest[V1, V2 any](x Seq[V1], y Seq1[V2]) Seq2[V1, V2]

func ZipAll[V1, V2 any](x Seq[V1], y Seq1[V2]) Seq[Zipped[V1, V2]]
func ZipAll2[K1, V1, K2, V2 any](x Seq[K1, V1], y Seq2[K2, V2]) Seq[Zipped2[K1, V1, K2, V2]]

Where ZipAll{,2} are the original and Zip, ZipLongest are defined in terms of them and ZipLongest pads with zero.

Since the more general versions exist, these can stick to the nice case where two 1-iterables form a 2-iterable without needing anything extra so if you can get away with using them they're much more ergonomic.

aarzilli commented 10 months ago

@benhoyt

Why do some functions take the f func as the last parameter, but Filter and Map take it as the first, and Reduce in the middle?

They concatenate better:

Map(func(x int) int { return x*2 },
    Filter(func(x int) bool { return x%2 == 2 },
         seq))

to read this you start from the last line and move up, as opposed to:

Map(
     Filter(seq,
         func(x int) bool { return x%2 == 2 }),
    func(x int) int { return x*2})

where you have to start in the middle and zig-zag outwards. For this reason function first is the most common order of arguments. For example:

the only exception to this, that I know of, is a certain javascript library

aarzilli commented 10 months ago

Should the order of type parameters of Map be inverted? Should it be Map[Out, In]? If a proposal for lightweight function syntax (#21498) is ever accepted the type of In can be easily inferred, but the type of Out probably not. Because the generic instantiation syntax lets you omit only types at the end, the types that are easily inferred should come later.

b97tsk commented 10 months ago

Changing Filter, Map, Reduce etc. to return Ops and a built-in operator to chain Ops might be a good idea.

type Op[In, Out any] func(In) Out

func Filter[V any](f func(V) bool) Op[Seq[V], Seq[V]]
func Map[In, Out any](f func(In) Out) Op[Seq[In], Seq[Out]]
func Reduce[Sum, V any](sum Sum, f func(Sum, V) Sum) Op[Seq[V], Sum]

Usage:

for v := range slices.Values(s) | xiter.Filter(...) | xiter.Map(...) {
    fmt.Println(v)
}

sum := slices.Values(s) | xiter.Filter(...) | xiter.Map(...) | xiter.Reduce(...)
fmt.Println(sum)

// equivalent to

for v := range xiter.Map(...)(xiter.Filter(...)(slices.Values(s))) {
    fmt.Println(v)
}

sum := xiter.Reduce(...)(xiter.Map(...)(xiter.Filter(...)(slices.Values(s))))
fmt.Println(sum)

The compiler can do its best to inline every Op call.

Merovius commented 10 months ago

@b97tsk Your Usage code doesn't make a lot of sense to me. You use a boolean or | operator, but those functions do not involve any booleans. [edit] sorry, I misread. It seems you are suggesting adding a function-piping operator. Note that this has been suggested in the past and been rejected [/edit]. I also don't really see the point of the Op type - what's the advantage of Op[A, B] over just writing func(A) B?

earthboundkid commented 10 months ago

They concatenate better:

Map(func(x int) int { return x*2 },
    Filter(func(x int) bool { return x%2 == 2 },
         seq))

to read this you start from the last line and move up, as opposed to:

Map(
     Filter(seq,
         func(x int) bool { return x%2 == 2 }),
    func(x int) int { return x*2})

where you have to start in the middle and zig-zag outwards.

Both read strangely because the order of transformations is seq → filter → map, not map → filter → seq.

I think this is the best you can do without chaining:

filteredSeq := iter.Filter(initSeq,func(x int) bool { 
    return x%2 == 0 
})
finalSeq := iter.Map(filteredSeq, func(x int) int { 
    return x*2 
})

If #21498 were accepted, maybe the function definitions could be shortened somehow.

That said, I think one of the nice things about old Go was that there wasn't a "functional" variation, so you never had to sit down a decide "do I write a regular for-loop here? Or a list comprehension? Or use map?" There was just one way to do it. This code is still pretty appealing to me:

func doubleEven(seq iter.Seq[int]) iter.Seq[int] {
    return func(yield func(int) bool) {
        for n := range seq {
            if x%2 == 0 {
                if !yield(x*2) { return }
            }
        }
    }
}
aarzilli commented 10 months ago

That said, I think one of the nice things about old Go was that there wasn't a "functional" variation, so you never had to sit down a decide "do I write a regular for-loop here?

Agreed. AFAIC it would be better to not have them in the standard library at all.

zephyrtronium commented 10 months ago

Agreed. AFAIC it would be better to not have them in the standard library at all.

Note that this proposal is not for the standard library, precisely because "[t]here are concerns about how these would affect idiomatic Go code." Not that that means you shouldn't dislike the idea, of course.

AndrewHarrisSPU commented 10 months ago

@aarzilli

They concatenate better:

This was interesting!

Map(
     Filter(seq,
         func(x int) bool { return x%2 == 2 }),
    func(x int) int { return x*2})

go fmt seems to preserve this, which I think reads more clearly just because of the spacing:

xs = Map(Filter(xs,
    func(x int) bool { return x%2 == 0 }),
    func(x int) int { return x * x })

With this formatting, the order of reading this reminds me of pipe-forward |> operators I'm seeing emerge in a number of languages, in addition to the more classic LISP style.

DeedleFake commented 10 months ago

That is nice. Right-to-left is a bit odd, but much better than inside-out. Things do get a bit wonky if any of them have more arguments or if you use multi-line anonymous function, though:

product = Reduce(Filter(Map(numericStrings(),
    func(str string) int {
        n, _ := strconv.ParseInt(str, 10, 0)
        return n
    }),
    func(n int) bool { return n > 0 }),
    0,
    func(acc, cur int) int { return acc * cur })

Only the lines that end in a ) end a function in the chain, which makes for some messy reading with something like Reduce() because of the lack of extra indentation. I think it's probably overall still better than just calling them all manually and the bizarre inside-out mess that makes, especially because I think calls with more arguments are less common and are probably mostly sink functions, like Reduce(). Alternatively, gofmt will preserve putting both of the final arguments to Reduce() on a single line above, i.e. 0, func(acc, cur int) int { ... }, but that might not always be feasible either if combined they're awkwardly long. Another annoyance is the way that you have to count to figure out which function goes with which arguments. Filter() is second from the right so its arguments are second from the bottom.

I like the idea of the |> operator. It's been proposed before multiple times, including in the context of iterators, and if I remember right one of the main arguments against it was that it could encourage people to design their APIs specifically so that they could be used with it, even though that isn't necessarily the best design. I'm not sure of my own opinion on that argument, as you could say the same thing about methods and chained method call APIs, but I do think that it could be a potential problem in a few cases. It would very much not work with multiple returns, which could lead people to try to minimize multiple returns as much as possible, and that also puts it in direct contradiction with the way errors work. With methods that are designed to be chained, they're restricted to only what you define as the one creating the type, but with something like |> the number of possible things it could be useful to leave it open for increases drastically.

earthboundkid commented 10 months ago

The accumulator for product should start with 1, which I feel is an argument against reduce functions. 😊

DeedleFake commented 10 months ago

Ah, whoops. I did just a simple sum originally but changed it for the sake of some variety and forgot to change the initial value.

b97tsk commented 10 months ago

@Merovius

Note that this has been suggested in the past and been rejected

When your kid ~cries~ asks for a toy, you might compromise and purchase one for your kid if you take it seriously. Would you do the same thing for someone else's kid? I mean, this isn't someone else's proposal. Definitely got some weight in it. I think it's worth another go-through in their regular meetings. It's not a bad toy, after all.

I also don't really see the point of the Op type - what's the advantage of Op[A, B] over just writing func(A) B?

I think I would just leave the decision (whether to keep it) or the name choice to someone else to make.

Merovius commented 10 months ago

@b97tsk I don't think the source of the proposal matters a lot. And the exact functions we are talking about on this issue are the primary use case given for any sort of function chaining syntax (note how the proposal I linked to uses Map as a motivation). In other words, nothing has changed since that discussion.

AndrewHarrisSPU commented 10 months ago

The accumulator for product should start with 1, which I feel is an argument against reduce functions. 😊

In all seriousness - other than Equal, is Reduce the only fold here? Reduce seems trivial in the sense that the for-range equivalent is an entirely sane alternative most of the time. The one good reason I can think of to have Reduce is that it might be nice to curry out the iter.Seq input. It makes sense in a library like this, but practically it seems redundant with for-range.

jimmyfrasche commented 10 months ago

Compact{,Func}{,2} would be useful

willfaught commented 10 months ago

func Reduce[Sum, V any](sum Sum, f func(Sum, V) Sum, seq Seq[V]) Sum {

It's confusing to call this Sum because it can be for any purpose. Perhaps Accum(ulator) or Result would be clearer?

// Zip returns an iterator that iterates x and y in parallel, // yielding Zipped values of successive elements of x and y. // If one sequence ends before the other, the iteration continues // with Zipped values in which either Ok1 or Ok2 is false, // depending on which sequence ended first.

Zip typically drops elements that don't have a matching element in the other list because the lengths are different. This seems overly complicated. Perhaps it would be better to have a func that ensured a Seq had a minimum length, using a supplied default value to fill gaps, or a special zip func that took the default value and filled in the gaps itself?

// Limit returns an iterator over seq that stops after n values.

See Take, TakeWhile, Drop, and DropWhile in Haskell. All would be useful.

[Concat] and [Concat2] concatenate sequences. [Equal], [Equal2], [EqualFunc], and [EqualFunc2] check whether two sequences contain equal values. [Filter] and [Filter2] filter a sequence according to a function f. [Limit] and [Limit2] truncate a sequence after n items. [Map] and [Map2] apply a function f to a sequence. [Merge], [Merge2], [MergeFunc], and [MergeFunc2] merge two ordered sequences. [Reduce] and [Reduce2] combine the values in a sequence. [Zip] and [Zip2] iterate over two sequences in parallel.

There's too much duplication here, which is the result of having two separate iterator signatures, and needing to serve them both. It would be simpler to have one parameterized signature that could be instantiated with any type, including a 2-tuple-like type if needed. So we would have:

package iter

type Seq[V any] func(yield func(V) bool) bool

type Pair[A, B any] struct {
    A A
    B B
}

Then we could pass around iter.Seq[iter.Pair[int, error]], and only need one version of functions that take iterators.

I suppose the counterargument would be that this wouldn't enable for k, v = range syntax, but I don't think it's important to support that. Iterating over single values has been sufficient for every other language; I see no reason why it's any different for Go. for k, v = range is special syntax enabled by slice and map types being built into the language, and iter.Seq is not.

If we really wanted to shoehorn pair assignments into for/range, we could have it work for iteration values that implement

interface {
    First() sometype
    Second() sometype
}

so we'd have

package iter

type Pair[A, B any] struct {
    A A
    B B
}

func (Pair[A, B]) First() A
func (Pair[A, B]) Second() B

We got generics by committing to keep implementation strategies open, thus avoiding the generics dilemma.

I don't agree. We could have gotten generics without that commitment, it's just that the Go team insisted on it. We (or at least I) haven't seen evidence that shows we got more value out of that commitment than we would have without it.

Merovius commented 10 months ago

Iterating over single values has been sufficient for every other language; I see no reason why it's any different for Go.

I would say that it has been sufficient for ~every other language, because ~every other language has first-class tuples and destructuring assignments. Admittedly I'm not super familiar with many languages but Go, but ISTM that at least the ones I am somewhat familiar with do have an equivalent of the two-value range syntax this way.

I don't agree. We could have gotten generics without that commitment, it's just that the Go team insisted on it.

Potato, Tomato. The Go team is making the ultimate decisions about the language, so if we have to do something to convince the Go team that something is a good idea, then we had to do it. That is just as true today as it has been for the last ten years. So whether or not you agree with their decision and whether or not you agree with that form of governance of the Go project - they are the circumstances under which we work.

DeedleFake commented 10 months ago

Most Go range usages can yield two values, but a range over a channel is an existing usage that only yields one at a time. Given the extra complexity due to the doubling of so many basic functions, maybe it would be best to at least start with only single-value range-over-func for now as well? If something needs to yielda errors alongside other values, they can always just yield some small custom struct type.

willfaught commented 10 months ago

I would say that it has been sufficient for ~every other language, because ~every other language has first-class tuples and destructuring assignments

Haskell's and Python's tuples are normal types (albeit with some syntactic sugar). We can already declare comparable types in Go. I showed how to do so above with Pair.

Destructuring isn't required for iteration of single values. Haskell can do fst p[^1] to get the first item of pair "p". Python can do p[0] to get the first item of pair "p". Go can do p.First to get the first item of a pair "p" (or p[0], if you use an array instead of a struct).

[^1]: Yes, fst probably does destructuring under the hood, but that's only because Haskell's pair's fields are unnamed (and I'm not saying they should be), and unlike Python and Go, Haskell is all about destructuring. You get my point.

Merovius commented 10 months ago

@willfaught Destructuring is necessary to allow you to write x, y = f(), if f returns a tuple. Instead of having to write p = f(); x, y = p[0], p[1]. And destructuring requires the tuple types to be first-class types known to the language.

FTR I don't get your point. Because you say "Haskell is all about destructuring", which is precisely my point. Haskell has this language feature and it is quite essential for how it can manage with only one form of functional mapping primitives.

bcmills commented 10 months ago

I don't know about Haskell, but functional languages in the Hindley-Milner family often have some form of existential types.

In Go, that might look something like:

type Seq interface {
    [V any] func(yield func(V) bool) bool | [K, V any] func(yield func(K, V) bool) bool
}

func Concat[S Seq](seqs ...S) S {
    …
}

But then, of course, there is no way to implement the body of the iterator functions without some kind of parametric type-switch. 🤔

bcmills commented 10 months ago

Thinking about how functional adapters will be used, I think this package should also provide transformations between the two sequence arities. Otherwise it will be difficult to compose transformations that add information to a stream (such as lookup functions) or to compose Seq functions with containers that provide only Seq2 iterators.

It could either be two more Map variants:

func Map12[KIn, VIn, VOut any](seq Seq2[KIn, VIn], f func(KIn, VIn) VOut) Seq[VOut] {
    return func(yield func(VOut) bool) bool {
        for k, v := range seq {
            if !yield(f(k, v)) {
                return false
            }
        }
        return true
    }
}

func Map21[VIn, KOut, VOut any](seq Seq[VIn], f func(VIn) (KOut, VOut)) Seq2[KOut, VOut] {
    return func(yield func(KOut, VOut) bool) bool {
        for in := range seq {
            if !yield(f(in)) {
                return false
            }
        }
        return true
    }
}

Or, three standalone functions:

func Lift[V any](seq Seq[V]) Seq2[V, struct{}] {
    return func(yield func(V, struct{}) bool) bool {
        for v := range seq {
            if !yield(v, struct{}{}) {
                return false
            }
        }
        return true
    }
}

func Keys[K, V any](seq Seq2[K, V]) Seq[K] {
    return func(yield func(K) bool) bool {
        for k, _ := range seq {
            if !yield(k) {
                return false
            }
        }
        return true
    }
}

func Values[K, V any](seq Seq2[K, V]) Seq[V] {
    return func(yield func(V) bool) bool {
        for _, v := range seq {
            if !yield(v) {
                return false
            }
        }
        return true
    }
}