Closed mknyszek closed 1 year ago
@bcmills question about one of your recent comments:
Making it return a
bool
could perhaps work, but then it wouldn't be obvious to me whether afalse
return means “I consumed the value, don't produce another one” or “I didn't actually consume the value after all, put it back”. In the former case, there is no way to avoid consuming the first value from the iterator; in the latter case, iterator implementations would be forced to provide the capability to reject an already-produced value, which would be easy enough for iterators over static containers but much more complex for, say, ordered queues.
I thought "false" would mean: "ignore the returned value because I'm done". In this case, the first call on an empty iteration will return false, and the we do not have to put anything back.
I also am not particularly enthusiastic about the idea of user defined ranges, but I am not opposed. So far, having range accept a function of signature func() (T, bool) or func(I, T, bool) seems the most simple to me.
I would like to encourage proposals for user defined Rangers to discuss guidelines for concurrency.
I thought "false" would mean: "ignore the returned value because I'm done". In this case, the first call on an empty iteration will return false, and the we do not have to put anything back.
How would I do this for a queue-like data structure where I can’t easily put the element back? Or an iterator over network calls?
I thought "false" would mean: "ignore the returned value because I'm done". In this case, the first call on an empty iteration will return false, and the we do not have to put anything back.
How would I do this for a queue-like data structure where I can’t easily put the element back? Or an iterator over network calls?
If you are thinking about concurrency, I think any queue has a race condition for iterating unless the channel idea of closing is introduced.
But for a common queue, why not
type Queue[T any] ...
func (q Queue[T]) Ranger() func() (result T, done bool) {
return func() (T, bool) {
if q.hd == q.tl {
return
}
done = true
idx := q.hd
q.hd++
result = q.data[idx]
return
}
}
@wsc0 I was not discussing concurrency - that comment was aimed at the yield
proposal (where a bool return decides whether to continue or not). The issue with the boolean return is that in order to say “do not iterate”, the function argument for the iterator would need to be called once, meaning that the iterator would have already consumed a value.
For something like an iterator of sequential network calls, this would mean that in order to “take 0 values from the iterator”, you would need to take 1 value of the iterator in order to say “please don’t keep iterating”
@deanveloper Ah I see now. I always found yield hard to wrap my head around.
@wsc0 Yield in general (the generator pattern) is actually a very nice pattern for iteration. It allows you to make iterators in the exact same solution as if you were appending to a slice, as seen in https://github.com/golang/go/issues/43557#issuecomment-895211452. It was just that this particular implementation of it was poor, so there were quite a few problems.
@wsc0
What I always found confusing about yield
in Python was that the usage of yield
somewhere in the function automatically caused the function itself to behave completely differently from any other function, but there was no difference of any kind in the function signature. I think that a lot of that confusion can be avoided if it's more obvious that a yielding function isn't a normal function, assuming language-level support for it. If the support for it isn't language-level than I think that that'll solve itself. Good documentation helps, too.
I think I may open a proposal which is a reconsideration of https://github.com/golang/go/issues/19702, perhaps with some more details and a more persuasive case. It seems to have been closed for a few reasons:
We already have generator-based iterators in the form of channels, but we cannot use them because we would need to exhaust the channel in order to stop the goroutine. In my eyes, the only blocker that I'm not really sure about is defer
and whether or not defers/finalizers should be ran when the goroutine is GC'd. I'm thinking not, but I'll think more about this either when I write the proposal or a discussion is opened.
runtime.Deadlocked() <-chan struct{}
used like this?func numbers() <-chan int {
ch := make(chan int)
go func() {
for i := 0; ; i++ {
select {
case ch <- i:
case <-runtime.Deadlocked():
return
}
}
}()
return ch
}
ISTM the trouble with the issue as opened before was A) it's expensive to find the dead channels B) there was ambiguity about how to handle the exit (runtime.Goexit vs. defers, etc). Having an explicit select statement resolves those, no?
Even better, with generics, you could write:
func yielder[T any](ch chan T) func(t T) {
return func(t T) {
select {
case ch <- t:
case <-runtime.Deadlocked():
return
}
}
}
func numbers() <-chan int {
ch := make(chan int)
yield := yielder(ch)
go func() {
for i := 0; ; i++ {
yield(i)
}
}()
return ch
}
😆
Okay, that's a sort of pointless way to get out of writing a select statement, but it's interesting in principle as part of thinking about iterator functions. ISTM, we already have a really good general purpose iteration mechanism (channels), but we don't use it because A) it's too slow/expensive and B) there's no good way to handle closing. Can we solve problems A and B and thus get out of adding a whole new mechanism for iteration?
Here is a pretty good wrapper function that could be in the standard library, provided that runtime.Deadlocked
would exist sometime in the future:
package chans
func Generator[T any](generator func(yield func(T) bool)) <-chan T {
ch := make(chan T)
go func() {
generator(func(incoming T) bool {
select {
case ch <- incoming:
return true
case <-runtime.Deadlocked():
close(ch)
return false
}
})
close(ch)
}()
return ch
}
which would be used like:
package main
func Fibonacci() <-chan int {
return chans.Generator(func(yield func(int) bool) {
var a, b = 0, 1
for {
ok := yield(a)
if !ok {
return
}
c := a + b
a = b
b = c
}
})
}
The ok
return mainly exists for infinite generators to prevent infinite looping, but in the common case, if it is ignored it shouldn't break programs (unlike the original yield
/emit
solution). Alternatively we could do some sort of panic, but I think that may cause some issues.
go2go link seems to time out sometimes, but it works other times: https://play.golang.org/p/28dKug56JdQ
However this seems to be an issue with go2go because the non-generic solution works fine in the regular playground: https://play.golang.org/p/28dKug56JdQ
That can be simplified to
func Fibonacci() <-chan int {
return chans.Generator(func(yield func(int)) {
var a, b = 0, 1
for yield(a) {
c := a + b
a = b
b = c
}
})
}
Having an explicit deadlock test also makes it easier to deal with the "how to put the queue item back" problem, since there's an explicit way to check for the loop being exhausted and then run arbitrary code.
While I'm not sure if runtime.Deadlocked()
is exactly the right approach, I do think that it's a good idea. Cooperative cancellation, as with context.Context
, feels a lot more Go-like. Maybe, at least to start, it might make sense to make it runtime.deadlocked()
instead and then only create a iter.Generator()
wrapper around it with an unsafe hook into the unexported runtime
API. If you export it, it's going to encourage people to rely on other people checking for that deadlock and you're right back to the same situation as the ignored boolean, but now for potentially every single channel usage that crosses an API boundary.
In whichever case, though, I also worry a little about potential performance issues stemming from having to multi-thread this. I'm not entirely sure that just directly using a channel is the correct approach for a generic iterator system, either. I think that this should just be one possible way to create an iterator, be it iter.Iter[T]
or whatever else. Otherwise middleware functionality like map and filter is going to be kind of weird.
@deanveloper
go2go link seems to time out sometimes, but it works other times
The go2go playground's been having that problem a lot recently. I had a ton of trouble with it the other day.
Also, several of the examples above forgot the bool
return for yield
in the function signature.
Also, several of the examples above forgot the bool return for yield in the function signature.
My bad - I had added the bool
in later and forgot to re-copy the examples into my comment. I've fixed that now.
In whichever case, though, I also worry a little about potential performance issues stemming from having to multi-thread this.
Correct me if I am wrong, but pretty much every Generator uses some form of coroutine in order to implement the pattern. I've checked Kotlin and Rust at the very least. However I also believe that they control the coroutines a bit better so that it still operates single-threaded, and the value-passing also works differently from using a channel. The goroutine/channel overhead is quite the problem at large scale, in this case I have a binary tree of 10,000 elements and show how slow iteration is on each one: https://play.golang.org/p/1asKVzsdLNn. You cannot get results in the go playground, as time is fixed. Feel free to try on your own machine.
Results on my machine are ~1ms for purely-function-call based iteration, and ~480ms for the channel-based iteration. I expected the performance to be worse, but not that bad. Perhaps some optimizations could be done to chans.Generator
specifically, but I'm not very hopeful that we can get close enough.
EDIT - As a test, I implemented it in Javascript, and it runs for ~9ms. There definitely has to be a more optimal way to do this without goroutine overhead. https://gist.github.com/deanveloper/32d6c0c9b915f464cb917d5f76c34dd9
EDIT 2 - Did some more testing: https://play.golang.org/p/87HxaEmecno (again, execute locally for correct timings). Even with only a single goroutine created, the channel overhead is quite high. To iterate over a 10k element slice, it takes 20ms with chans, and 0.2ms with functions. That's 2 orders of magnitude... I'm not sure if we can get optimizations to help that much.
I think without language-level support, there's going to be some pretty bad performance issues that aren't really easily resolvable. It might be possible to optimize the channel for this specific usage and cut out some of the overhead, as I'm guessing that quite a bit of that comes from locking that isn't useful in this particular situation.
Setting GOMAXPROCS to 1 cuts the tree time by 20% and the slice time by 50%, but clearly there are still orders of magnitude of overhead to work out. I do think it is theoretically doable though, since many other languages have a coroutine mechanism. There "just" (ha!) needs to be some way to detect that the routines will always yield to each other and set them to a simpler scheduler once that's detected.
I think the best way to handle functions that need to have some sort of cleanup or close is to return a second func()
value:
gen, close := chans.Generator(func(yield func(int) bool) {...})
defer close()
No runtime support is necessary, and the pattern is familiar to Go programmers from context.Cancel
.
Unfortunately that remove the ability to have "inline" generators, like my earlier examples (ie for i := range Fibonacci()
). That is, unless we add a language change, such that range
on a function call that returns (<-chan T, func())
will call the 2nd return value when the for loop exits. That seems like a strange language feature outside of iterators and it would only work on function calls.
It may be quite bothersome to manually call an extra close()
when iterating, and it makes passing around iterators a lot more difficult as well. In IO, pipelining is a lot easier, since it is all handled in a single variable (which may implement io.Closer
). But needing to pass around 2 variables for a single iterator is a bit cumbersome.
@deanveloper, I don't think those are big problems.
Unfortunately that remove the ability to have "inline" generators, like my earlier examples (ie
for i := range Fibonacci()
).gen, close := Fibonacci() defer close() for i := range gen ...
But needing to pass around 2 variables for a single iterator is a bit cumbersome.
You normally wouldn't do that. Think about how you use *os.File
: the code that opens the file is responsible for closing it. For iterators it would be:
gen, close := Fibonacci()
defer close()
process(gen)
I'm sure there are situations where the close needs to happen somewhere else, but I'm guessing they're rare.
As for this all being cumbersome, that's true, but I don't think the language or runtime features you're hoping for are likely to happen. You could also do something like
type Iter[T any] interface {
Next() (T, bool)
Close()
}
and then it would be more like a file, but it turns out that forgetting to call Close
is a real problem, so I think the clumsiness is actually helpful.
I've actually been working for a bit on speeding up this conversion from the "yield" style iterator to a one-at-time iterator. I think the first step is to remove the explicit channel:
type Iter func() (int, bool)
You get the next element by calling the function; a second return value of false
tells you you're done. (@bcmills has designed a nice iterator package with generics using a similar definition.)
Now that the channel is hidden, you can play tricks like using a chan []T
to buffer the collected elements. The fastest I've been able to come up with so far takes a user-supplied buffer to avoid allocation, splits it in two, and swaps the halves between the producer and consumer in a kind of double-buffering:
func ToIter(b []int, ranger func(f func(int) bool)) (Iter, func()) {
c := make(chan []int)
done := make(chan struct{})
n := len(b) / 2
if n == 0 {
panic("buffer too small")
}
go func() {
b1 := b[:0]
b2 := b[n:n]
// producer
ranger(func(v int) bool {
b1 = append(b1, v)
if len(b1) >= n {
select {
case c <- b1:
case <-done:
return false
}
// Since channel is unbuffered, at this point the consumer
// has taken b1, meaning it no longer needs b2.
b1, b2 = b2[:0], b1
}
return true
})
// consumer
if len(b1) > 0 {
select {
case c <- b1:
case <-done:
}
}
close(c)
}()
return iterFromSliceChannel(c), func() { close(done) }
}
func iterFromSliceChannel(c chan []int) Iter {
var (
s []int
i int
ok bool
)
return func() (int, bool) {
if i >= len(s) {
s, ok = <-c
if !ok {
return 0, false
}
i = 0
}
i++
return s[i-1], true
}
}
For enumerating the values of a simple binary tree, this code is about 40% slower than calling the yield iterator directly, for large enough trees (with a 16K buffer, around a million nodes). It's much worse for smaller trees, because of the initial overhead.
For small trees it's probably better to just use the yield iterator to populate a slice, and iterate over the slice.
Ahaha, and we're back at where we started. Earlier, I was thinking about if we could possibly get yield-style iterators to work in the more classical iterator fashion, which would be the most ideal way to do it. My idea was similar to yours, before realizing that we can simply use channels as iterators if we got the needed performance improvements.
To organize some of my thoughts, here is what I think of the main proposed solutions, in descending order of preference:
runtime.deadlocked() <-chan struct{}
cancel
function
yield
-style iterators to "real" iterators
range
-able, but likely for the better.yield
-style/"Do" iterators
range
-ableAlso I had a new idea, would it be possible to use a runtime finalizer to free the deadlocked goroutine? I'm unsure, but it could possibly work. I'll start testing something out. (from a few tries at using finalizers, this doesn't seem possible. i'm not bothered by it since it's a pretty hacky solution anyway.)
from a few tries at using finalizers, this doesn't seem possible
@ianlancetaylor used a finalizer to clean up a deadlocked goroutine in the generics proposal.
Relevant code:
// Ranger provides a convenient way to exit a goroutine sending values
// when the receiver stops reading them.
//
// Ranger returns a Sender and a Receiver. The Receiver provides a
// Next method to retrieve values. The Sender provides a Send method
// to send values and a Close method to stop sending values. The Next
// method indicates when the Sender has been closed, and the Send
// method indicates when the Receiver has been freed.
func Ranger[T any]() (*Sender[T], *Receiver[T]) {
c := make(chan T)
d := make(chan bool)
s := &Sender[T]{values: c, done: d}
r := &Receiver[T]{values: c, done: d}
// The finalizer on the receiver will tell the sender
// if the receiver stops listening.
runtime.SetFinalizer(r, r.finalize)
return s, r
}
// A Sender is used to send values to a Receiver.
type Sender[T any] struct {
values chan<- T
done <-chan bool
}
// Send sends a value to the receiver. It reports whether any more
// values may be sent; if it returns false the value was not sent.
func (s *Sender[T]) Send(v T) bool {
select {
case s.values <- v:
return true
case <-s.done:
// The receiver has stopped listening.
return false
}
}
// Close tells the receiver that no more values will arrive.
// After Close is called, the Sender may no longer be used.
func (s *Sender[T]) Close() {
close(s.values)
}
// A Receiver receives values from a Sender.
type Receiver[T any] struct {
values <-chan T
done chan<- bool
}
// Next returns the next value from the channel. The bool result
// reports whether the value is valid. If the value is not valid, the
// Sender has been closed and no more values will be received.
func (r *Receiver[T]) Next() (T, bool) {
v, ok := <-r.values
return v, ok
}
// finalize is a finalizer for the receiver.
// It tells the sender that the receiver has stopped listening.
func (r *Receiver[T]) finalize() {
close(r.done)
}
Something to note: This method of doing it makes it unsafe to give the raw channel to the user, which in turn means that the user can't directly use it with a select
or with range
.
Edit: Also, if the documentation for runtime.SetFinalizer()
is correct, the code above has a bug, as the function passed to it above is a func()
, but it expects a func(*Receiver)
. Hmmm... Makes me wonder if it could be changed to SetFinalizer[T constraints.Pointer[E], E any](obj T, finalizer func(T))
. That would break the allowance for any number of ignored return values for finalizer
, though.
Something that's neat about that Sender
/Receiver
pattern is that it satisfies func() (T, bool)
, so it works with the original proposed iterator. We could move Generator
from package chans
to package iterator
, and redefine it like so:
package iterator
func Generator[T any](generator func(yield func(T) bool)) func() (T, bool) {
send, recv := chans.Ranger()
go func() {
generator(send.Send)
send.Close()
}()
return recv.Next
}
We could also add in the buffering that @jba mentioned to get a significant speedup, assuming that we don't get any goroutine/channel performance improvements that make it necessary. Even with the performance improvements, it will likely be beneficial to iterate over slices instead of using a buffered channel.
For a proper performance improvement, it could also be reimplemented at some point later the way that a bunch of other languages do it, thus getting rid of the channel and background thread, and the API could be left as is.
I'm still not thrilled about the need for yield
to return a value to indicate that there's nothing receiving anymore, though. It would be much preferable for the generator function to somehow just return on its own, running deferred functions on the way. Would it make sense for the yield
implementation that gets passed to just call runtime.Goexit()
instead of returning something?
Would it make sense for the yield implementation that gets passed to just call runtime.Goexit() instead of returning something?
Unfortunately I can't seem to get it to work. I'm not sure what's preventing recv
from being GC'd, feel free to take a crack at it yourself. It may be that the original implementation of Ranger
never called the finalizer either, I never tested it. https://go2goplay.golang.org/p/pPFUHTQyqI9
(Or, the Go 1 version which I was testing on my local machine: https://play.golang.org/p/6SaBW8TmNsy)
@deanveloper
I fixed it. It seems like the problem was that the use of a method value for the finalizer function resulted in an extra reference being kept around, so it never got garbage collected. I changed it to a method expression and it worked.
[gri, Jan 20] 1. I'm not convinced that the suggested syntactic sugar (that's what it is, after all) buys that much. That said, if we chose this translation:
for { a1, a2, ... an, ok := f() if !ok { break } ... }
we would get new iteration variables for each iteration which would address the variable capture problem without the need for a new
range
syntax.
The proposal should not attempt to solve the variable capture problem (in which "for x := range seq" creates a single variable x for the whole loop, which is sometimes undesirable). It would be surprising and inconsistent for the compiler to create one variable per loop if seq is a slice, but one variable per iteration if seq is a function. Also, if the variable escapes, it would turn a single allocation into a linear number of allocations.
I'm not a fan of this proposal. Although the range syntax is neat, it is confusing at first sight, unlike most Go constructs; and as @jba points out, it creates for the first time a function call where there is no call syntax, and doesn't address the great variety of iteration constructs that come up in practice.
@tv42 wrote:
You can replace any closure by defining a type that captures the free variables, returning a method value on that type. That often leads to more debuggable & readable code, too.
I tend to disagree with the second sentence there. In my view a closure is significantly less debuggable than an interface value because there's no way to tell what's going on inside it: we can't find out what type of iterator it is, and there's no way to define other methods on it that could be used to help debugging.
I also think that having an explicit type is clearer. When I see a func() (T, bool)
it's not immediately clear that to me that it's an iterator.
@rogreppe I think you misunderstood what I was trying to say. I also think closures are harder to debug than explicit types.
In case it meaningfully develops the discussion, I'm actually a vote in favor of closures.
My introduction to Go was in context of an API controller, backed by SQL. The standard database/sql Rows type had an iterator that carried a "Next()bool" method and "Scan(...interface{}) error" method. It was convenient, straight forward, and if you pre-allocated appropriate to the return size, was blisteringly fast.
I adopted it as an iterator in a two-closure fashion such as below. It's entered generically, but typically I would have redefined it per declared type. I get that closures aren't considered ergonomic, but I dissent. Excepting generics, the following is a pattern that is in use, in production code.
type IterRead[T any] func()*T
type IterWrite[T any] func(*T) bool
type IterNext[T any] func() bool
type Data[T any] []T
func iterSeek[T any](s *[]T, i int) (int, bool) {
switch {
case s == nil: return i, false
case len(*s) > i+1: return i+1, true
default: return i, false
}
}
func iterScan[T any](s *[]T, v *T, i int) (int, bool) {
i, ok := iterSeek(s, i)
if ok { *v = (*s)[i] } // could just pointer+SizeOf(T)...I guess I haven't played with unsafe+generics since the preview on 1.17
return i, ok
}
func Map(dP *Data[T]), f func(*T) T, z func() T) (IterRead[T], IterNext[T]) {
var v T
i, ok := -1, false
return (
func() *T { return &f(v) },
func() bool {
if ok { i, ok = iterScan(dP, &v, i) }
if !ok { v = z() }
return ok
}
)
}
It might feel heavy, but it's fairly easy to bundle stuff up out of the way. When used with concrete types, performance is quite good, and a read/write version is just func(p []T, z func()T) (r func()*T, w func(*T) bool, n func()bool)
. Of course, that all extends to a struct type with methods, but the pattern with closures just feels like less work to manage.
for instance, supposing the above, to transform a slice of any inferable type:
f, n := Map(data, mapFunc, zeroFunc)
for n() {
fmt.Printf("%T: %v", f(), f())
}
And considering that there's not actually a dependency on generics here, as I use the pattern with concretes, it's an example of something that absolutely works. It could as easily be used with generics as with interfaces, meaningful or empty interfaces, as everything is defined by the caller, and type-assertions can be performed as appropriate by the caller.
I don't know what coroutines would look like as implemented in Go, but this pattern serves me as a poor-mans thread-local coroutine. To performance, this gets better if a pattern following this model could be guaranteed-inlined. In terms of ergonomics, I am stoked for #21498 to be finalized. Even in "today Go", I think this pattern has potential. If you then want to associate those independent closures with a struct, sure. Instantiate an interface-literal, and attach them? Sure.
One thing I'd consider to be missing would be a way to inform the iterator that the caller is "finished" with the iterator, i.e. if the range loop were exited early for some reason.
That could be solved with something like this:
type RangeIterator[T any] func() (value T, ok bool)
type ResourceIterator[T any] interface {
Begin() RangeIterator[T]
Close()
}
iter := tree.Iter()
for i := range iter.Begin() {
// Do stuff with i
}
iter.Close()
Note that the range aspect of the proposal would be unaltered. The cleanup is just built into the outer type according to its particular needs.
I'm in favor of this proposal. Iterating over user collections or generators is something that I've always felt is a bit clunky in Go. I'm a huge fan, but nothing is perfect.
Today I wrote yet another generator that did not feel right and was thinking about writing a proposal myself. Researching the open issues revealed that multiple proposals already exist. I'll leave my comments here instead in the hope that they may influencing the decision to do something about the situation.
I landed in two solutions myself of which one is identical to the "range over closure" proposal presented here.
The second solution that I've always had in mind is to just introduce "proper while loops" as they exist in C. This suggestion was shut down in https://github.com/golang/go/issues/21855 but I think the risk of errors would be eliminated by using a "while" keyword instead of overloading another behavior with "for". Such a while would accept the same forms as "if".
Example: Procedural approach
iter := tree.Iter()
while i, ok := iter.Next(); ok {
// Do stuff with i
}
Not to side track this proposal with another one. I just wanted to mention this alternative to get it off my chest.
As already listed, I'm aware of thee common ways to iterate over things.
I don't think channel based approaches are reasonable for performance reasons.
Iterating over collections is something that is done fairly often. Having support for expressing this naturally would make it easier to read code (and IMHO make it slightly more elegant).
Folks following this proposal will likely be interested in discussion: standard iterator interface
Please see the discussion at #56413, which extends this idea.
Closing this proposal in favor of the discussion at #56413, which includes much of what is discussed here, and more.
Proposal: Function values as iterators
Motivation
A number of proposals have come up over the years of supporting some first-class notion of iterator such that loops that use the
range
keyword may be used to iterate over some custom data structure. The Go community in general has also wondered about the "right" way to describe an iterator, evidenced by this blog post from 2013 that describes many of the ways Go programmers abstract away iteration, and much of this is true today as well.Overall, however, iteration over a non-builtin data structure may be somewhat error-prone and clunky. To refresh on the various ways that iteration may be abstracted away in Go, let's work through some examples.
Firstly, we may have some tree type with some dedicated iterator type:
One way such an "iterator" could be used is,
or even,
The former usage works fine, but suffers from readability issues. The meaning of the code is obscured somewhat as the reader first sees an apparently infinite for-loop and must look for the "break" statement to understand what's going. Luckily this condition is usually present at the top of the loop, but it requires a more careful look. Furthermore, the iteration condition needs to be written explicitly. Writing it once may not be a problem, but writing it 100 times might be.
The latter usage also works fine and the intent is more clear, but it has a similar problem with the iteration condition. There's also an element of repetition which on the surface is fine, but it does harm the readability of the loop. Especially with variable names like "i" it becomes easy to get lost in punctuation.
Another way to abstract iteration away is to pass a function value to a function that iterates on behalf of the caller. For example:
This method works well in many scenarios, but is decidedly less flexible as it separates the loop body from the surrounding code. Capturing local variables in that function value helps, but potentially at the cost of some efficiency, depending on the complexity of the iteration. One advantage of this method, though, is that
defer
may be used to perform clean up on each loop iteration, without allocating a defer (thanks to open-coded defers).Prior art
A previous proposal (#40605) suggested allowing types with a
Next
method to have that method repeatedly called when used in a range loop:This works fine, but from my perspective, doesn't feel very Go-like. Having a language construct be aware of a type's methods for the sake of syntactic sugar is not a pattern found anywhere else in the language (yet). In the parlance of the generic design, the existing
range
keyword matches on the underlying type used, not the type's methods.Furthermore, it usually requires defining a new type which is a bit more work for the writer of the code as well as the reader. Overall a bit clunky, but not bad. It lines up well with how other languages work. Rust, for instance, uses a trait (analogous to an interface) to determine whether a type is an iterator.
Another previous proposal (#21855) suggested supporting a two-clause for-loop to make iterating less error-prone, such as:
Unfortunately, a two-clause loop form is itself error-prone, as the placement of a single semicolon has a significant effect on semantics (specifically, the second semicolon which distinguishes between the 2-clause and 3-clause forms). This proposal was rejected because that semicolon was considered too dangerous.
Other proposals (#24282) have been made to fundamentally change how
for
loops work, indicating at least some degree of friction.Proposal
Rolling with the idea of closures, and with the observation that the range form matches on types largely structurally, I would like to propose that range loops repeatedly apply function values of a particular form.
More specifically, I propose allowing the for-range statement to accept values of type
func() (T, bool)
orfunc() (T, S, bool)
(whereT
andS
are placeholder types; they may be any substituted for any other type) and will repeatedly call these values until theirbool
result is false.Iterators may then be written in the following way:
More precisely, the last
for
statement "de-sugars" into the following code, wheretmp
is a temporary variable not visible to the program:The limitation to variables of type
func() (T, bool)
orfunc() (T, S, bool)
(instead of allowing an arbitrary number of return values) is to keep range loops looking familiar and to avoid a misuse of the syntax.Discussion and observations
Pros:
Iter
and perform a direct function call for each iteration (theoretically that could be inlined further, but it depends).Cons:
range
could mean, and each call could be arbitrarily expensive (e.g. a series of HTTP requests).range
keyword doesn't always make sense to the reader (it does for custom data structures and in some other cases, though).func() (T, bool)
may not always mean "iterator." ANext
method is more explicit.This idea was inspired by the way Python generators are used as iterators. I recently had the realization that Python generators can approximated by repeated application of closures. I thought that this would have been proposed already, but I couldn't find anything like it. Interestingly, this proposal also allows for iterating over infinite generators and such. It can also be used to write Python-like loops (for better or for worse):
It might also be worth allowing the iterator's final return value to be an
error
instead of abool
, though I'm not totally sure how to surface that to the user.I'm not even totally convinced myself that this is worth doing, since the benefit seems minor at best, but I figured I should at least put the idea out there.