Closed mknyszek closed 5 months ago
CC @rsc
just to get this right, as I'm often working on system-related Go code dealing with Linux kernel namespaces that needs Os-level thread locking: the caller creating the iterator and asking to iterate needs to lock the thread, as the iterator itself cannot, correct?
Is there a way for the iterator to ensure its caller has locked the thread?
Specifically, we would restrict iterators from being able to call
runtime.LockOSThread
by panicking if called from an iterator goroutine
I worry this is too restrictive, as some libraries call LockOSThread
temporarily as part of normal operation. Being unable to use these libraries seems rather strange.
A few examples I have found:
github.com/golang/glog
uses LockOSThread
inside of Fatal
calls to signal the current thread: https://github.com/golang/glog/blob/b256bbe5d3f846c1d1ce73030c10973378e04cf0/glog_file_linux.go#L34github.com/libgit2/git2go/v34
uses LockOSThread
around every cgo call: https://github.com/libgit2/git2go/blob/v34.0.0/blame.go#L22 (MakeGitError
apparently uses thread-local storage)Admittedly I haven't found a ton of cases like this, but from a user perspective it seems like a very orthogonal limitation to for loop iterators.
just to get this right, as I'm often working on system-related Go code dealing with Linux kernel namespaces that needs Os-level thread locking: the caller creating the iterator and asking to iterate needs to lock the thread, as the iterator itself cannot, correct?
Correct. The caller creating the iterator and any caller of next
or stop
is allowed to be locked to the thread. Just the iter.Seq
passed to iter.Pull
would not be allowed.
Is there a way for the iterator to ensure its caller has locked the thread?
I'm not sure I understand. The iter.Seq
implementation is executed on a separate goroutine when passed to iter.Pull
. Because it's a different goroutine altogether, there's already no way to ensure anything about the calling goroutine's state. The iter.Seq
implementation would have to ensure it locked to an OS thread itself, if it needed (and if it was allowed, which the proposed fix does not allow).
I worry this is too restrictive, as some libraries call LockOSThread temporarily as part of normal operation. Being unable to use these libraries seems rather strange.
That's a good point. You might not be in control of, or be aware of, what the library is doing. I think the glog
example is especially interesting, though I'm not sure the git2go
example makes a lot of sense to me (calling into C already temporarily binds the goroutine to the thread, and that will continue to work).
We could also punt the question, pick a more restrictive implementation to begin with, and see what happens in practice. We can always backwards-compatibly lift the restriction in the future.
Thinking out loud, could we have a slow implementation, and only fall back to it if only the iterator goroutine locks to the thread? That might be OK? However, we then have a bit of a "worst of both worlds" situation. Not only do you have a performance cliff (it's just in a different place now), you still have the pitfall wherein the iterator goroutine (not locked) may see thread state the code author didn't expect it to.
slow would be better than panic.
I'm not sure the git2go example makes a lot of sense to me (calling into C already temporarily binds the goroutine to the thread, and that will continue to work).
I was confused as well initially. The problem here is that MakeGitError
calls C.git_last_error()
, which presumably uses something thread-local similar to errno
.
ecode := C.git_blame_options_init(&opts, C.GIT_BLAME_OPTIONS_VERSION)
if ecode < 0 {
return BlameOptions{}, MakeGitError(ecode)
}
Imo, it's better to start with something more restrictive (such as panicking for now), and then lift some of those restrictions over time as issues arise and solutions appear more clear
Very interesting. I believe pull iterators may well alleviate the need for https://github.com/golang/go/issues/64777 (and https://github.com/golang/go/issues/64755), but I ran into issues with runtime.LockOSThread
and Cgo callbacks. I documented my findings in https://github.com/golang/go/issues/65946 in case they're not a duplicate of this issue.
Myself, @aclements, @cherrymui, and @dr2chase discussed this offline a little bit more, and if hypothetically panics are off the table, I think we have an idea of what the implementation would have to look like.
If the calling goroutine is locked but the iterator goroutine isn't, that iterator goroutine can share the thread with the calling goroutine. If next
moves to a different goroutine on a different thread, the iterator might run there, and that's fine. If the iterator really needs to read thread-local state, it must call runtime.LockOSThread
. This case is still fast.
If the iterator goroutine calls runtime.LockOSThread
, it immediately sets a flag in the coro implementation to fall back on a channel-based implementation. It then wakens the calling goroutine, which must check whether that flag is set. (This can be checked non-atomically, because the calling goroutine must be blocked when this happens.) The iterator goroutine then calls into the scheduler. Upon executing later, it locks itself to whatever thread it ends up running on. This ensures that both the calling goroutine and iterator goroutine, if both locked, are guaranteed to be locked to different threads.
Even if the calling goroutine is not locked to a thread, for simplicity, we'll always fall back to the channel-based implementation.
In short: the iterator goroutine can share threads with thread-locked goroutines, but the calling goroutine cannot.
The reason for this asymmetry is that you can pass the iterator goroutine's context to another thread, and it would be difficult to support a calling goroutine being able to share a thread with the iterator goroutine when the iterator goroutine is locked specifically. (The calling goroutine would have to be able to force itself to switch threads to the iterator goroutine's thread. This might require a substantial amount of work in the scheduler.)
We suspect the iterator wanting to lock to the thread to be less common in general than the calling goroutine being locked to a thread. There's still a performance pitfall here, but it hopefully shouldn't show up most of the time. That is, if a thread-locked goroutine is just iterating over, say, just some straight-forward all-in-Go data structures with iter.Pull
(say, iterating over a pair of trees or something), that will still be fast.
What happens when the iterator goroutine is switched? Will this program successfully hand over the OS main thread to a different goroutine?
package main
import (
"fmt"
"iter"
"runtime"
)
func init() {
// Lock main thread to main goroutine.
runtime.LockOSThread()
}
func main() {
next, _ := iter.Pull(switchThread)
// Pass thread to iterator, and get back a different
// thread.
next()
// Continue on a different, non-main thread.
fmt.Println("successfully passed thread")
}
func switchThread(yield func(struct{}) bool) {
// Unblock the waiting pull iterator with different
// thread.
go yield(struct{}{})
// Perform main-thread work.
fmt.Println("received thread")
}
If so, https://github.com/golang/go/issues/64777 and https://github.com/golang/go/issues/64755 can be closed.
My understanding is that this is what would happen
My understanding is that this is what would happen
I don't understand how main
can get the main thread back. switchThread
is called on the locked main thread and doesn't give it up. The original proposal states:
That iterator goroutine effectively gets locked to the same thread while its executing
But to your argument the proposal also states
and we know that it will eventually switch back to the calling goroutine.
which is not necessary true with the go yield()
hack.
I don't understand how main can get the main thread back. switchThread is called on the locked main thread and doesn't give it up. The original proposal states
I was talking about @mknyszek's comment that immediately precedes yours.
And of course, as long as the iterator goroutine runs on the locked main thread, I can force the runtime to let me keep it:
package main
import (
"fmt"
"iter"
"runtime"
)
/*
static void main_thread_api(void) {
switchBack();
call_main_thread_api_that_never_returns();
}
*/
import "C"
func init() {
runtime.LockOSThread()
}
func main() {
next, _ := iter.Pull(switchThread)
// Runs on the main thread.
next()
// Continues on some other thread.
fmt.Println("successfully passed thread")
}
//export switchBack
func switchBack() {
// Runs in a Cgo callback which effectively locks the main
// thread so the runtime can't give it back through the call
// to yield.
go yieldMain(struct{}{})
}
var yield func(struct{}) bool
func switchThread(yield func(struct{}) bool) {
yieldMain = yield
fmt.Println("received main thread, running main thread API")
C.main_thread_api()
}
I don't understand how main can get the main thread back. switchThread is called on the locked main thread and doesn't give it up. The original proposal states
I was talking about @mknyszek's comment that immediately precedes yours.
Which states:
If the calling goroutine is locked but the iterator goroutine isn't, that iterator goroutine can share the thread with the calling goroutine.
which to me seems to support my use-case: my calling goroutine is locked to the main thread, the iterator goroutine isn't, so they can (will?) share threads.
I don't understand how main can get the main thread back. switchThread is called on the locked main thread and doesn't give it up. The original proposal states
I was talking about @mknyszek's comment that immediately precedes yours.
Which states:
If the calling goroutine is locked but the iterator goroutine isn't, that iterator goroutine can share the thread with the calling goroutine.
which to me seems to support my use-case: my calling goroutine is locked to the main thread, the iterator goroutine isn't, so they can (will?) share threads.
My understanding is that the iterator goroutine can run on the locked thread but it isn't locked to it. The scheduler can still migrate it to a different goroutine if it wants to. So you don't have any guarantees as to which thread will execute any of the instructions in switchThread
.
It would be weird if it worked the way you are proposing, it would mean that in general people writing iterators shouldn't pass yield
to a different goroutine, because doing so could result in removing a thread lock.
To be honest, I'm now a bit worried about this thread sharing, @mknyszek: what happens if the body of switchThread
calls something that blocks the thread (some blocking cgo code or a syscall)? Couldn't that potentially lead to a deadlock (in theory the main thread should be able to run because yield was called, but it cannot because its locked thread is unavailable forever).
yield confusion
I don't understand how main can get the main thread back. switchThread is called on the locked main thread and doesn't give it up. The original proposal states
I was talking about @mknyszek's comment that immediately precedes yours.
Which states:
If the calling goroutine is locked but the iterator goroutine isn't, that iterator goroutine can share the thread with the calling goroutine.
which to me seems to support my use-case: my calling goroutine is locked to the main thread, the iterator goroutine isn't, so they can (will?) share threads.
My understanding is that the iterator goroutine can run on the locked thread but it isn't locked to it. The scheduler can still migrate it to a different goroutine if it wants to. So you don't have any guarantees as to which thread will execute any of the instructions in
switchThread
.It would be weird if it worked the way you are proposing, it would mean that in general people writing iterators shouldn't pass
yield
to a different goroutine, because doing so could result in removing a thread lock.
I see. Your analysis makes sense, but then I'm surprised that the iterator does not run on the same thread as the caller, regardless of iter.Pull
. In particular:
package main
import "runtime"
func main() {
runtime.LockOSThread()
it := func(yield func() bool) {
// Surprisingly, this body of code may run on
// a completely different thread.
}
for range it {
}
}
I see. Your analysis makes sense, but then I'm surprised that the iterator does not run on the same thread as the caller, regardless of
iter.Pull
No, iter.Pull would be what makes the difference, IIUC. If you are using for range f
everything will run on the same goroutine (and therefore thread if LockOSThread is used).
As @aarzilli says, this does not affect for range f
. You'd have to explicitly pass the function next
returned by iter.Pull
to a different goroutine.
To be honest, I'm now a bit worried about this thread sharing, @mknyszek: what happens if the body of switchThread calls something that blocks the thread (some blocking cgo code or a syscall)? Couldn't that potentially lead to a deadlock (in theory the main thread should be able to run because yield was called, but it cannot because its locked thread is unavailable forever).
That example is breaking my brain. I need to time to digest that.
It may well be that having the iterator call into cgo or a syscall means we need to treat it the same as calling LockOSThread
, meaning it'd be kicked to a different thread immediately.
As @aarzilli says, this does not affect
for range f
. You'd have to explicitly pass the functionnext
returned byiter.Pull
to a different goroutine.
Are you saying that iter.Pull
behaves differently than for range f
? And only some of the time, in an indeterministic way? In particular:
package main
import "runtime"
import "iter"
func main() {
runtime.LockOSThread()
it := func(yield func(struct{}) bool) {
// Always on the main thread if called in a
// `for range f`, (most of?) the time on the
// main thread if called by `iter.Pull`.
}
for _ = range it {
}
next, _ := iter.Pull(it)
for {
next()
}
// Bonus question: what happens when
// yield is called from another goroutine?
it2 := func(yield func(struct{}) bool) {
go yield(struct{}{})
}
for _ = range it2 {
}
}
To be honest, I'm now a bit worried about this thread sharing, @mknyszek: what happens if the body of switchThread calls something that blocks the thread (some blocking cgo code or a syscall)? Couldn't that potentially lead to a deadlock (in theory the main thread should be able to run because yield was called, but it cannot because its locked thread is unavailable forever).
That example is breaking my brain. I need to time to digest that.
It may well be that having the iterator call into cgo or a syscall means we need to treat it the same as calling
LockOSThread
, meaning it'd be kicked to a different thread immediately.
Yeah, considering cgo/syscalls as implicitly calling LockOSThread for the duration of their calls seems consistent.
Are you saying that iter.Pull behaves differently than for range f? And only some of the time, in an indeterministic way?
iter.Pull
is defined as creating a new goroutine to run the iterator, so I'm not surprised it behaves differently. for range f
meanwhile is defined as iterating on the same goroutine: it's just a bunch of function calls. iter.Pull
has the same semantics as passing values back and forth between goroutines across a channel (approximately, it's really more of an unbuffered channel that is used to synchronized a shared memory location, but close enough). It's just very tightly optimized, meaning that goroutines pass time between each other directly instead of going through actual channels.
Are you saying that iter.Pull behaves differently than for range f? And only some of the time, in an indeterministic way?
iter.Pull
is defined as creating a new goroutine to run the iterator
FWIW, the documentation doesn't mention goroutines:
% GOEXPERIMENT=rangefunc go doc iter.Pull
package iter // import "iter"
func Pull[V any](seq Seq[V]) (next func() (V, bool), stop func())
Pull converts the “push-style” iterator sequence seq into a “pull-style”
iterator accessed by the two functions next and stop.
Next returns the next value in the sequence and a boolean indicating whether
the value is valid. When the sequence is over, next returns the zero V and
false. It is valid to call next after reaching the end of the sequence
or after calling stop. These calls will continue to return the zero V and
false.
Stop ends the iteration. It must be called when the caller is no longer
interested in next values and next has not yet signaled that the sequence is
over (with a false boolean return). It is valid to call stop multiple times
and when next has already returned false.
It is an error to call next or stop from multiple goroutines simultaneously.
In any case it seems better to have consistent behaviour between for range func
and iter.Pull
. Here's one way:
for range func
.yield
called from another goroutine will panic (similar to how calling yield
after iteration has stopped also panics).This has several advantages:
for range func
.iter.Pull
: calling LockOSThread
from the iterator goroutine behaves the same as if the calling goroutine had called it.Here's another reason yield
should panic if called from another goroutine than the locked one:
func main() {
rng := func(yield func(struct{}) bool) {
go yield(struct{}{})
select{}
}
runtime.LockOSThread()
for range rng {
// This body is *not* running on the locked thread!
}
}
As far as I can tell from experiments, the body of the for range func
loop is not running on the locked thread, despite LockOSThread
.
FWIW, the documentation doesn't mention goroutines:
Hm... OK. That's a good point.
Guarantee that the iterator goroutine runs on the same thread as the calling goroutine
I don't think we can guarantee this in general. I assume you mean just in the LockOSThread
case.
In that case, that's kind of interesting. Is the idea that if one of these two goroutines locks to the thread, they're always both locked (and if there are more, nested, calls to iter.Pull
, effectively all goroutines in the chain are locked)? That may simplify things a little bit, but I am worried about the next point (which seems to be necessary to make this work without a lot of complexity). I think this is something similar to what @cherrymui suggested, at some point.
There may be something I'm missing here to implementing this though. Offline, we've discussed a few different ideas and it's all getting pretty jumbled for me.
If the thread is locked, yield called from another goroutine will panic (similar to how calling yield after iteration has stopped also panics).
Calling yield
on another goroutine is generally allowed, and I feel like this may introduce a surprising divergence. (Some iterators will simply be unusable with goroutines locked to OS threads, though I suppose it'll probably be rare.)
On the one hand, your example is compelling that this behavior would be surprising. Simultaneously, it's already true that the loop body could execute on another goroutine, and it's overall more consistent to say that the new goroutine isn't locked to the thread.
I do not know what the right answer is here, and I don't think I alone could decide. I think we need to take a step back, write up all the different possible semantics, and get a big picture view.
Another dimension to all of this is: what happens if the iterator and/or yield function calls runtime.Goexit
? Normally, that's supposed to destroy the thread if a goroutine is locked to it. Can we actually do that? And how does that align with the semantics of for range f
?
Guarantee that the iterator goroutine runs on the same thread as the calling goroutine
I don't think we can guarantee this in general. I assume you mean just in the
LockOSThread
case.
Yes. The interesting cases I can think of are LockOSThread
, Cgo callbacks, (Windows) syscall callbacks.
(FWIW, I tend to think of "same thread" equivalent to "any thread" if no thread is locked, because the runtime is free to move an unlocked goroutine to any existing or new thread.)
In that case, that's kind of interesting. Is the idea that if one of these two goroutines locks to the thread, they're always both locked (and if there are more, nested, calls to
iter.Pull
, effectively all goroutines in the chain are locked)?
Exactly. Multiple goroutines locked to the same thread is ok, because the coroutine semantics and yield panic ensure that at most one goroutine may run on the locked thread.
In that case, that's kind of interesting. Is the idea that if one of these two goroutines locks to the thread, they're always both locked (and if there are more, nested, calls to
iter.Pull
, effectively all goroutines in the chain are locked)? That may simplify things a little bit, but I am worried about the next point (which seems to be necessary to make this work without a lot of complexity). I think this is something similar to what @cherrymui suggested, at some point.
I had some thoughts along these lines. I'm unclear about whether the impetus towards a guaranteed-different-thread locking solution is from a performance perspective, or from correctness perspective (and I'm also sure I'm unclear about other things).
I think I understand how degenerate scheduling patterns could arise from a usually-same-thread solution, and passing yields around. I do wonder that, to some degree, there's always been some risk of this with the runtime
tools.
I'm less clear about correctness problems. I agree there is a possibility of confusion where a configuration with caller and iterator locking to different threads can't be eliminated. What else is there?
If the thread is locked, yield called from another goroutine will panic (similar to how calling yield after iteration has stopped also panics).
Calling
yield
on another goroutine is generally allowed, and I feel like this may introduce a surprising divergence. (Some iterators will simply be unusable with goroutines locked to OS threads, though I suppose it'll probably be rare.)
Good point.
On the one hand, your example is compelling that this behavior would be surprising. Simultaneously, it's already true that the loop body could execute on another goroutine, and it's overall more consistent to say that the new goroutine isn't locked to the thread.
I do not know what the right answer is here, and I don't think I alone could decide. I think we need to take a step back, write up all the different possible semantics, and get a big picture view.
Given your point above, I sketched the three semantics I see when the calling goroutine is locked to a thread:
iter.Pull
were implemented without runtime magic and as such runs on a different OS thread. As far as I see it, this option is the worst of options: hard to get optimal performance from and doesn't match the thread behaviour of a range-over-func.This leaves the only issue with option 3: it's possible for the iterator goroutine to "steal" a locked thread by calling yield
from a different goroutine. E.g.:
func stealThread() {
runtime.LockOSThread()
it := func(yield func(struct{}) bool) {
go yield(struct{}{})
// The locked OS thread continues execution from here.
....
}
next, _ := iter.Pull(it)
next()
// From here on, another OS thread continues execution.
...
}
My question is then: what's the big deal? Note that a limited form of thread stealing is already allowed by a range-over-func, as demonstrated by a previous comment:
func stealThread2() {
rng := func(yield func(struct{}) bool) {
go yield(struct{}{})
// Locked thread continues here.
}
runtime.LockOSThread()
for range rng {
// This body is *not* running on the locked thread!
}
// Locked thread continues here.
}
The difference is that iter.Pull
allows you to open-code the loop and as such the amount of code that would run without the thread locked seems larger than the range-over-func body. But that hardly matters: the range-over-func body can be arbitrary large because it includes function and method calls, recursively.
So: assuming the thread behaviour of range-over-func is fixed, I propose option 3 for iter.Pull
to get optimal performance and semantics consistent with range-over-func, including thread stealing.
I forgot to mention that in practice, the stop
function will limit the scope of accidental thread stealing to the function body creating the iter.Pull
:
func f() {
runtime.LockOSThread()
stealThread()
// The locked thread continues here.
}
func stealThread() {
next, stop := iter.Pull(it)
defer stop() // stop blocks until the iterator goroutine returns, along with the locked thread.
// Here, the locked thread may or may not execute.
}
Here are two more surprising cases caused by a iter.Pull
not consistent with range-over-func:
Just like reflect.MapIter, some users would like to obtain range-over-func behaviour from reflection code without the explicit for {}
loop. I imagine ssa/interp is one such user. This is trivially achieved with a consistent iter.Pull
, but very difficult otherwise.
In other words, I expect a call for a reflect.Iter
if iter.Pull
isn't it.
Some situations with a locked thread will not be able to use an inconsistent iter.Pull
. Consider an iterator that expects to run on the locked thread. This works:
func f() {
runtime.LockOSThread()
lockedIter := func(yield func(Element) bool) {
// This body *must* run on the locked thread.
}
for elem := range lockedIter {
// Process elem, also on the locked thread.
}
}
Now, assume that the requirements change so that a simple for loop is no longer sufficient, prompting the conversion to iter.Pull
:
func f() {
runtime.LockOSThread()
lockedIter := func(yield func(Element) bool) {
// This no longer runs on the locked thread!
}
next, stop := iter.Pull(lockedIter)
defer stop()
// Use next and process elements.
}
However, the converted code is (subtly) broken: lockedIter
no longer runs on the locked thread.
Consistency is important in Go's design process, so I must have missed the reason(s) or discussions why it's ok for iter.Pull
to be inconsistent with range-over-func. It may be obvious to some but not to me.
Thank you for bearing with me.
Just want to point out that the right way to think about this is that Pull can be implemented with goroutines and channels, and anything else is an optimization, that must only be used when its behavior does not differ from what you would get from goroutines and channels. And if we can't figure out how to detect the differing uses, then the implementation is goroutines and channels. (One approach is to revert to goroutines and channels any time a LockOSThread goroutine touches the Pull value handoff.)
Just want to point out that the right way to think about this is that Pull can be implemented with goroutines and channels, and anything else is an optimization, that must only be used when its behavior does not differ from what you would get from goroutines and channels.
Can you please qualify this assertion that iter.Pull
should deliberately behave differently than range-over-func? I offered two examples where an inconsistent iter.Pull
would make its use difficult or impossible, and asked whether avoiding "thread stealing" is worth sacrificing consistency for. I also pointed out above that "thread stealing" is also possible with range-over-func, albeit limited to the loop body.
I don't think I asserted that. We can implement iter.Pull
with goroutines and channels, and that was the first proposed version of it. "coroutines" are an optimization. This works:
func GoPull[T any](seq Seq[T]) (iter func() (T, bool), stop func()) {
next := make(chan struct{})
yield := make(chan T)
go func() {
defer close(yield)
_, ok := <-next
if !ok {
return
}
seq(func(v T) bool {
yield <- v
_, ok := <-next
return ok
})
}()
return func() (v T, ok bool) {
select {
case <-yield:
return v, false
case next <- struct{}{}:
v, ok := <-yield
return v, ok
}
}, sync.OnceFunc(func() {
close(next)
<-yield
})
}
"coroutines" are an optimization
With respect to LockOSThread
, "coroutines" behaves differently than "goroutines + channels". My examples are cases where this difference matters.
Range-over-func is implemented as "coroutines". Are you suggesting range-over-func should change to behave like "goroutines + channels" too? If not, why the inconsistency?
In range-func iterators, there are neither goroutines nor coroutines. Those only show up to implement "Pull", when two range-over-function iterators need to be paired up.
"Coroutines" don't have a dependable semantics yet, and can't be relied on to "define" any other behavior;. They are an optimization for a particular goroutine-channel idiom, and it looks like LockOSThread might not be compatible with that optimization. In this context, "behave differently" is actually "behave broken".
PS/edit -- in general, a programmer doesn't get to know, or complain, about the specific binding between goroutine and thread. Certain choices are faster, but faster is preferred, not guaranteed. LockOSThread is an exception to this rule, forced by various host platforms attaching wonky/interesting semantics to thread identity. "Are pull-er and pull-ee goroutines bound to the same thread?" is a question you may ask, but the Go implementation is not required to answer it in any particular way, especially if requiring that answer creates weird deadlocks.
In range-func iterators, there are neither goroutines nor coroutines. Those only show up to implement "Pull", when two range-over-function iterators need to be paired up.
Are you saying that the LockOSThread
problem doesn't exist for range-over-func? This is not so, as demonstrated by https://github.com/golang/go/issues/65889#issuecomment-1979405291.
In other words, both range-over-func and iter.Pull
must have some behaviour with respect to threads locked to goroutines, and so the "coroutine vs goroutine" question is relevant for both.
"Coroutines" don't have a dependable semantics yet, and can't be relied on to "define" any other behavior;. They are an optimization for a particular goroutine-channel idiom, and it looks like LockOSThread might not be compatible with that optimization. In this context, "behave differently" is actually "behave broken".
PS/edit -- in general, a programmer doesn't get to know, or complain, about the specific binding between goroutine and thread. Certain choices are faster, but faster is preferred, not guaranteed. LockOSThread is an exception to this rule, forced by various host platforms attaching wonky/interesting semantics to thread identity. "Are pull-er and pull-ee goroutines bound to the same thread?" is a question you may ask, but the Go implementation is not required to answer it in any particular way, especially if requiring that answer creates weird deadlocks.
What deadlocks are you referring to? In particular, are there deadlocks particular to iter.Pull
that doesn't show up in a range-over-func?
Regardless of the particular behaviour with respect to locked threads, can we at least agree that range-over-func and iter.Pull
should behave the same? That is, iter.Pull
being the open-coded version of a range-over-func. If not, why not?
Rangefunc iterators are not required to use an additional goroutine, and if they do, they may cause interesting problems with LockOSThread. Programmers might not want to do that in general-purpose iterators that might be used where LockOSThread is active. For Pull, there is no choice -- there must be a second goroutine. Pull forces the issue, whereas for rangefunc, "maybe you don't want to do that" (the root cause is LockOSThread, but it already exists).
Deadlocks occur if we start requiring that Pull must use coroutines, provisionally defined as "two goroutines sharing a single OS thread". Combining that with LockOSThread seems to create the possibility of two goroutines both demanding to use the same OS thread, which is where things start to go wrong.
I don't understand "iter.Pull being the open-coded version of a range-over-func", nor am I sure I understand what you mean by "behave the same". An iterator should deliver its values in the same order, whether it is rangefunc or passed to Pull, that is the same, but the plumbing that delivers those values is quite a bit different. And the problem is in the plumbing.
Rangefunc iterators are not required to use an additional goroutine, and if they do, they may cause interesting problems with LockOSThread. Programmers might not want to do that in general-purpose iterators that might be used where LockOSThread is active.
Surely it's useful to specify the behaviour when the caller or the iterator calls LockOSThread? Or are you suggesting that iterators cannot be used in combination with LockOSThread? If so, you're also excluding code called from a Cgo callback as well as library code that has no chance to know its calling context.
Deadlocks occur if we start requiring that Pull must use coroutines, provisionally defined as "two goroutines sharing a single OS thread". Combining that with LockOSThread seems to create the possibility of two goroutines both demanding to use the same OS thread, which is where things start to go wrong.
Please provide an example of a deadlock. I claim that two goroutines can share a specific thread because the calling goroutine and iterator goroutine can be cooperatively scheduled: the running (and thread-locked) goroutine explicitly hands over control (and therefore the locked thread) when calling yield
or next
, goes to sleep and allows the other goroutine to continue (on the locked thread).
I don't understand "iter.Pull being the open-coded version of a range-over-func", nor am I sure I understand what you mean by "behave the same". An iterator should deliver its values in the same order, whether it is rangefunc or passed to Pull, that is the same, but the plumbing that delivers those values is quite a bit different. And the problem is in the plumbing.
By "behave the same" I mean that a rangefunc must be interchangeable with an equivalent iter.Pull
and vice versa, without any change in behaviour visible to the programmer (except performance). That includes the behaviour of thread locked by the caller or iterator. For example, a Go interpreter should be able to faithfully implement rangefunc with iter.Pull
.
Note that I'm not even saying what the locked thread behaviour should be, just that the behaviour should be equal.
Suppose goroutine A creates a pull function P, and passes it to goroutine B. B calls P, P calls LockOSThread and returns a value to B. B calls LockOSThread (just to be really unambiguous that B and P now both have a lock on the same OS thread), and acks A in a channel, and then blocks in a cgo call waiting for some almost-never-happen event, normally B would still be waiting there on process exit.
(Alternate scenario for glueing B and P to the same thread, B is already LockOSThreaded, P then calls LockOSThread when B calls it. Normally in this situation we would increment a counter.)
A calls LockOSThread. A calls P. What happens? I don't think we can get the OS Thread loose from the cgo call.
It's not formal deadlock, but it's not great.
Another interesting case occurs if, A is locked to an OSThread, and creates two pull functions P and Q (and for sake of non-ambiguity, pulls a value out of each one and they each also call LockOSThread), then hands P and Q to separate goroutines B and C, who attempt to call P and Q concurrently. We can time slice those, of course, so no deadlock, but that's new and interesting for the Go scheduler.
I've been trying to figure out what should/shouldn't work and where something must change. I have a default bias that LockOSThread is the pig at the party here, but maybe that's wrong.
I think that these things should by-default work, with possible exceptions in the case of LockOSThread "somewhere":
type Seq[V any] func(yield func(V) bool)
is a function, and a function is a value, there are no restrictions on how it is used.iter.Pull[V any](seq Seq[V])
is a pair of functions, and functions are values, there are no restrictions on how they are used, other than "be sure to call stop()
after done calling yield()
".LockOsThread
should be able to use rangefunc iterators and Pull functions. In general there's no way to know where iterators might appear.The last case can create problems, depending on why it called LockOSThread. Did the goroutine need it for reasons unrelated to the iteration (context), or was it because of iteration (iterator), or was it because the loop body needed to run on a particular OS thread (body)?
If it's just context, it doesn't matter what the range function does or where the loop body runs, they are incidental. And if Pull is called, that is not a problem either, whether it uses goroutine or coroutine because the iterator is insensitive to goroutine/thread.
If it's the loop body, then Pull is not an issue because Pull works on iterators. In the default case of rangefunc, that does not explicitly do interesting things with goroutines (range functions can do things with goroutines, but it's not the default, and it looks like perhaps running yield (the loop body) in a different goroutine should be discouraged), this will also "just work", the range function runs in the same goroutine as the body and as its callers, so it all works there, too.
The interesting case is the iterator. I think this breaks down into three subcases, which differ in how the deal with undoing LockOSThread.
dispose
function to be called after the iterator is done, to undo the LockOSThread. There are two subcases, in both, there is a clue for the programmer that this is not a standard range function:
a. Lock/UnlockOSthread occur in the caller's context
b. the range function or its generator creates a separate goroutine, which is Locked, and communicates with it via channels. In the first case, it's obvious that LockOSThread is needed, and thus the special rule applies, "don't pass the range function to other goroutines". This case is also sensitive to where the Pull function executes, so a coroutine implementation is preferred. 2a is the same.
For 2b and 3, because the locked goroutine is hidden within the range function, it doesn't matter where the range function itself runs.
I think the only tricky case is for an iterator that requires locking to an OS thread, but these are either obvious when they are created, or else insulate iterator value generation inside another goroutine.
It seems like a range function that runs "yield" (the loop body) in a different thread than the outer caller is officially a bad idea unless this is documented (i.e., it has a purpose).
I think it is also a bad idea if an range function calls LockOSThread (in the caller's goroutine, so, 2a) without unlocking before it returns to the caller. That implies a need for a balancing UnlockOSThread that is not necessarily provided. GoodStyle™ would be to lock the OS thread when the range function is created, not when it is called.
This is the best I've got on this much time. I'll think about it some more, but maybe other people will have better thoughts, improve it, or shoot it down.
Thank you for the detailed example. I believe I've identified where our "coroutine" semantics differ. Hopefully, the following explanation is clearer.
First, assumptions:
iter.Pull
behaviour, except for performance.Then my proposal: let the locked thread (if any) and its lock count follow goroutine switches between caller and iterator, for both rangefunc and iter.Pull
.
The consequences are:
LockOSThread
is called by either caller or iterator, the execution trace has that thread locked from then on.yield
s from another goroutine. This is perhaps surprising, but notably already possible with the current implementation of rangefunc:func main() {
// This iterator runs on the locked thread.
rng := func(yield func(struct{}) bool) {
go yield(struct{}{}) // Run the loop body with a different thread.
select{} // Or a blocking Cgo function.
}
runtime.LockOSThread()
for range rng {
// The loop body is *not* running on the locked thread!
}
}
Suppose goroutine A creates a pull function P, and passes it to goroutine B. B calls P, P calls LockOSThread and returns a value to B. B calls LockOSThread (just to be really unambiguous that B and P now both have a lock on the same OS thread), and acks A in a channel, and then blocks in a cgo call waiting for some almost-never-happen event, normally B would still be waiting there on process exit.
With my proposal, the following happens:
yield
.(Alternate scenario for glueing B and P to the same thread, B is already LockOSThreaded, P then calls LockOSThread when B calls it. Normally in this situation we would increment a counter.)
A calls LockOSThread. A calls P. What happens? I don't think we can get the OS Thread loose from the cgo call.
next
.It's not formal deadlock, but it's not great.
By passing locked threads between caller and iterator, LockOSThread deadlocks are no longer possible.
Another interesting case occurs if, A is locked to an OSThread, and creates two pull functions P and Q (and for sake of non-ambiguity, pulls a value out of each one and they each also call LockOSThread), then hands P and Q to separate goroutines B and C, who attempt to call P and Q concurrently. We can time slice those, of course, so no deadlock, but that's new and interesting for the Go scheduler.
The same happens for Q, leaving A locked to T.
Now, B and C have no threads locked so P and Q will run on some arbitrary thread when called by them.
This seems problematic:
Since P was previously locked to T1, and might have made that call expecting that subsequent calls would also be locked to T1, running on T2 seems wrong. However in my "how is this used anyway?" scenario, the iterator inside the pull function "should not" have exported (left unreversed) a call to LockOSThread; that should have occurred in the creating context, perhaps, and this is even something we can check for in the Pull implementation, and panic.
Locking the OS thread when the iterator is created in A/T2 creates the opposite problem of "what happens in B/T1 when B calls P", and how do we distinguish that "bad" case from the "not bad" case where the iterator does not care that it is thread-locked, and just happens to be embedded within code that A called, and A was locked to T2 for other reasons?
I am inclined to assert that if there is an iterator that requires thread-locking to work properly, that it must be coded to hide that from its callers by creating its own goroutine, locking that to an OS thread, and using it to generate values, which are then passed to the yield function in the caller's context (that way, the calling function can also require thread-locking for the body, perhaps even to a different thread). Probably such an iterator would be paired with a dispose/stop function and might also have a finalizer, to recover the goroutine and locked OS thread.
So if I were to try to get closer to a proposal, Pull functions use coroutines, and if the locked count changes between entry and exit to the yield function (i.e., unpaired lock/unlock), panic, and if an iterator (a range function) wishes to run on a locked OS thread, it must create a separate goroutine for that purposes, the calling context is not guaranteed.
I am inclined to assert that if there is an iterator that requires thread-locking to work properly, that it must be coded to hide that from its callers by creating its own goroutine, locking that to an OS thread, and using it to generate values, which are then passed to the yield function in the caller's context (that way, the calling function can also require thread-locking for the body, perhaps even to a different thread). Probably such an iterator would be paired with a dispose/stop function and might also have a finalizer, to recover the goroutine and locked OS thread.
This only applies to iterators that require a locked thread separate from their caller, correct? That is, iterators that must be locked to their caller's locked thread will just work like the equivalent rangefunc does.
So if I were to try to get closer to a proposal, Pull functions use coroutines, and if the locked count changes between entry and exit to the yield function (i.e., unpaired lock/unlock), panic,
Would rangefunc iterators also panic? If not, why not?
and if an iterator (a range function) wishes to run on a locked OS thread, it must create a separate goroutine for that purposes, the calling context is not guaranteed.
By "calling context is not guaranteed" do you mean "the calling context is what's calling next
"? That is, the context is well-defined but under control of the caller and thus may change during each call to yield
.
This only applies to iterators that require a locked thread separate from their caller, correct? That is, iterators that must be locked to their caller's locked thread will just work like the equivalent rangefunc does.
I think so, including "if you pass the range function to a different goroutine, it will see a different context" (don't do that if you want it to run in a constant context -- in this situation the programmer is/should-be aware that thread locking is in play).
Would rangefunc iterators also panic? If not, why not?
Probably not, and the reason is cost. We're one optimization and some boosted inlining away from range function iterators running really well, and all the checking code that gets inserted into the generated yield function constant-propagates away. Checking the locked-thread-state won't go away like that. We might include a more-checking mode for this, maybe we would always check it anyway.
By "calling context is not guaranteed" do you mean "the calling context is what's calling next"? That is, the context is well-defined but under control of the caller and thus may change during each call to yield.
Right.
The underlying problem here is that LockOSThread is probably the wrong primitive, it would have been better expressed as func WithLockedOSThread(runsInLockedContext func())
, but, oops, mistakes were made. This would have made some of the problematic cases unsayable. We had the go-func and defer-func examples right there to copy, and we blew it, oh well.
One performance-related question -- if we declare that it is GoodStyle™ for range functions that need locking to create a separate locked goroutine and access that through channels, that's going to be even more expensive when passed to Pull. A problem with range function iterators is that they're opaque -- they don't provide the option of offering a cheaper implementation of Pull (e.g., a SliceIterator has a much cheaper implementation of Pull, but that's not available). It seems like in this case if you know that the iterator will require a thread locked goroutine, we might be better off inverting the relationship, have Pull be the default, and the range function derived from that. (I am not at all sure about this, but I don't like the pile of expensive layers.)
The other reason that rangefunc iterators likely won't panic is that the range function is written by a human and really is "just a function". The compiler rewrite of
for x := somerangefunction {
body
}
is
<declare and initialize some state variables>
yield := func(x T) bool {
<fiddle with state variables>
body // with tweaked rewrites for break/continue/goto/return
return true
}
somerangefunction(yield)
The most we could check is that thread locking state doesn't change between initialization and each iteration of the loop body, but the bare range function is just a function.
And strictly speaking, nobody's required to use iter.Pull, a somewhat slower non-coroutine version would also work. I think the interesting question here is whether we want to sign up for the different, extra, and potentially useful semantics of passing OS-thread-locked state across a coroutine switch. Originally, this had just been a performance optimization.
Would rangefunc iterators also panic? If not, why not?
Probably not, and the reason is cost. We're one optimization and some boosted inlining away from range function iterators running really well, and all the checking code that gets inserted into the generated yield function constant-propagates away. Checking the locked-thread-state won't go away like that. We might include a more-checking mode for this, maybe we would always check it anyway.
May I suggest not panic'ing for iter.Pull
either? It's a subtle panic condition, it seems ok to require LockOSThread users to be careful, and most importantly I think it's important to be able to refactor rangefuncs into iter.Pull
equivalents and be certain the behaviour is equal.
On the other hand, starting with the panic and relaxing it later is backwards compatible.
And strictly speaking, nobody's required to use iter.Pull, a somewhat slower non-coroutine version would also work. I think the interesting question here is whether we want to sign up for the different, extra, and potentially useful semantics of passing OS-thread-locked state across a coroutine switch. Originally, this had just been a performance optimization.
iter.Pull
is less useful if it doesn't match rangefunc behaviour. Go interpreters come to mind, as well as my reflect example.
In addition, starting with a non-context-passing iter.Pull
cannot later be changed to pass context in a backwards compatible way.
I'm not sure I understand the reflect example. Could you explain more about it, maybe with a concrete example?
As I understand, rangefunc is just a function, so the way to use reflection to run a rangefunc loop is just Call
the rangefunc, passing in the "body" written in terms of reflection.
I'm not sure I understand the reflect example. Could you explain more about it, maybe with a concrete example?
Please ignore the reflect example; I failed to find an instance that is not a case of the general second example in my comment. That is, cases where (1) iter.Pull
is required, and (2) the iterator must run on the same locked thread as the caller.
I am very sympathetic to wanting to implement range-over-func using iter.Pull
. I spent a couple of weeks in December trying [unsuccessfully] to do this for x/tools/go/ssa and interp. I was eventually convinced that I was trying to pound a square peg through a round hole. x/tools/go/ssa has a proposed implementation based on function rewriting today.
The FAQ already covers that iter.Pull and similar approaches are not equivalent: https://go.dev/wiki/RangefuncExperiment#why-are-the-semantics-not-exactly-like-if-the-iterator-function-ran-in-a-coroutine-or-goroutine . Let me try to give an expanded version. This explanation is mostly outside of the scope of this issue, but I think it addresses what you are asking.
There are two main proposed semantics/implementations for range-over-func statements:
for x := range somerangefunction {
body
}
The first implementation is function rewriting. This takes a range-over-function statement and constructs a new synthetic function that communicates across function boundaries via hidden state, calls the iterator function on the synthetic function, and reestablishes control flow after the function call. Adjusting what David had a bit:
<declare and initialize some state variables>
yield := func(x T) bool {
<fiddle with state variables>
body // with tweaked rewrites for break/continue/goto/return
return true
}
somerangefunction(yield)
<finish tweaked break/continue/go/return and panicking>
What exactly those tweaks are is a bit complicated, but these should be mostly invisible to regular users. They are not invisible to compilers and interpreters though. The overwhelming advantage of this approach is that it is as fast as users expect. This is mostly accomplished by inlining somerangefunction
and yield
.See https://go.dev/wiki/RangefuncExperiment#can-range-over-a-function-perform-as-well-as-hand-written-loops
The second implementation is to concurrently execute somerangefunction and to communicate between the concurrent routines. In practice, concurrent routines are implemented via goroutines in go. This leads to a candidate semantics based around iter.Pull or similar.
next, stop := iter.Pull(somerangefunction) // execute somerangefunction in another goroutine
defer stop()
for x, done := next(); !done; x, done = next() {
body
}
stop()
This is a bit over simplified, but the key point is somerangefunction
executes concurrently with body in a different goroutine. What is great about iter.Pull is that it can be library function. It does not need extra compiler help to change body or somerangefunction. This leads to the lowering above being simple.
There are an enormous number of additional posible tweaks to both of the above approaches (runtime.coro
for example), but those are the two big categories of proposed semantics for range-over-func statements.
These semantics do differ. Stacks and goroutines are observable in go. It is easier to see this with an example.
var count int
func somerangefunc(y func(x int) bool) {
defer() func { if r := recover(); r != nil { count++; panic(r) } }()
for y(rand.Int()) {}
}
func f() {
defer func() { _ = recover() }
for x := range somerangefunc {
switch x {
case 1: panic(x)
case 2: runtime.Goexit()
case 3: break
}
}
}
Let's suppose rand.Int() returns 1 first, so panic(x) happens. What are the stacks at the panic(x) call in the different semantics?
[panic(x), yield, somerangefunc, f]
[panic(x), f]
and [yield_pull, somerangefunc, iter.Pull]
. (yield_pull is an internal lambda in iter.Pull.)Because the stacks are different, the panic executes differently in these two cases.
yield_pull
or somerangefunc
. Count is not incremented.So the value of count is an example of an observable difference in the two semantics. The root cause is that the goroutines and calls are structured differently in the two proposals, and this is observable.
It is also worth considering a panic within the iterator function ('somerangefunc' in the example) also plays out differently. One semantics terminates the program. Also runtime.Goexit() within the iterator function also plays out differently. Exercises left to reader. This is not even getting into whether runtime's stack traces are considered observable. The runtime library is particularly prone to having functions like LockOSThread that make them even more observable.
In the end, range-over-func's were chosen to be equivalent to the function rewrite semantics. This is mostly due to efficiency (and partially debugging and partially robustness). So users of range-over-func should have the function rewrite semantics in mind when they use it (the simple version above), not iter.Pull
. This gives iter.Pull
a bit of a sharp edge that needs to be documented.
You can then ask can anything be done to narrow the divide. Maybe. But I would recommend against trying to do this on the issue tracker. You need to poke some fundamental stuff like "what is a stack in Go?", or "what if there was a second kind of coroutine in the runtime?" and do this in a backward compatible fashion. I recommend taking a few days [or weeks] and try to come up with something offline.
I hope this helped.
With background out of the way, on to specific questions.
https://github.com/golang/go/issues/65889#issuecomment-2047144742 :
Just like reflect.MapIter, some users would like to obtain range-over-func behaviour from reflection code without the explicit for {} loop. I imagine ssa/interp is one such user.
ssa/interp does not need this. The ssa package does the function rewrite so the interpreter did not need serious modification. It did need modification to handle defer elegantly. (Alternatives for the defer changes exist but they are really ugly.)
This is trivially achieved with a consistent iter.Pull, but very difficult otherwise.
It is not trivial. And it probably will be difficult. Range-over-function is a big language change enabled by compiler magic. This places a burden on tools sensitive to detailed semantics like compilers and interpreters.
FWIW ssa/builder.go is now ~17% more lines of code.
func f() { runtime.LockOSThread() lockedIter := func(yield func(Element) bool) { // This no longer runs on the locked thread! } next, stop := iter.Pull(lockedIter) defer stop() // Use next and process elements. }
lockedIter
is called from another goroutine than f
. If you inline iter.Pull
, the go
statement becomes clear and this ceases to be surprising. So I would say this example WAI. Yes this is a sharp edge and that is unfortunate.
For example, a Go interpreter should be able to faithfully implement rangefunc with iter.Pull.
I do not know how to do this. Maybe someone more clever than me has already solved this? Based on my own experience updating an interpreter, one can get something 90% of an implementation of range-over-func using iter.Pull, but it will not end up 100% equivalent to the function rewrite semantics. And interpreters need to be >= 99% to be useful. The stacks are different and trying to work around this is hard.
range-over-func is a big language change, and I expect interpreters to need to do a big adjustment. I cannot accurately predict what these will be for other interpreters. Interpreters likely need to understand the code transformation to understand the semantics of a range-over-func statement. A good place to start is the comments here https://go.googlesource.com/go/+/refs/heads/master/src/cmd/compile/internal/rangefunc/rewrite.go (and the tests in rangefunc_test.go). Interpreter maintainers can then determine what support they need from the Go standard library to implement range-over-func in their tool. They can file proposals requesting these features.
FWIW I suspect what they need is to attach state equivalent to #next
to range-over-func yield functions, to attach meaning to #next
based on the program locations, and handle the "<finish tweaked break/continue/go/return and panicking>" step based on this value. Again they need to understand the details.
iter.Pull is less useful if it doesn't match rangefunc behaviour.
Fair enough. But hopefully it is still useful enough.
That is, cases where (1) iter.Pull is required, and (2) the iterator must run on the same locked thread as the caller.
Maybe try to rephrase how you think of this requirement as 'On the statement go g()
the function g
must run on the same locked thread as the go g()
statement'. I'll leave it to the folks that work on the runtime to address whether/how this is doable.
Problem
The current implementation of
iter.Pull
in the rangefunc experiment creates a new goroutine and uses an optimized goroutine switch path during iteration.The problem with this implementation, as it stands, is that it interacts poorly with
runtime.LockOSThread
. If the goroutine spawned for the iterator (let's call it the iterator goroutine) callsruntime.LockOSThread
, first and foremost, that call will not be respected. For example, if thenext
function is passed to a different goroutine, the runtime will not schedule the iterator goroutine on the thread it was locked to. Bad things might happen. Simultaneously, even in the easy case, if the iterator goroutine always switches with the same calling goroutine on the same thread, then once that iterator goroutine exists, it will take down the current thread with it. This is expected, but this happens before the calling goroutine gets placed back into the scheduler's ownership. The calling goroutine instead stays blocked forever with no hope of executing it again.One possible solution
The most straightforward fix to this is to fall back to a slower kind of switching of goroutines if either the iterator goroutine or the calling goroutine locks itself to a thread. This slower kind of switching would probably be done with channels, and would have semantics that are equivalent to the existing implementation of
iter.Pull
. Because at least one of the goroutines is locked to a thread, the overhead of OS context switching also comes into play, so this implementation would likely be dramatically slower.This fix, then, has a really unfortunate performance cliff. One could imagine that a UI library (for example) locks a goroutine to a thread because it needs to interact with C libraries that require staying on the main thread. Using
iter.Pull
on such a goroutine would be a very, very bad idea if this workaround was adopted. The resulting performance regression would be very surprising.The proposed solution
Discussing with @aclements, @dr2chase, and @cherrymui, we have an alternative fix that avoids the performance cliff. Specifically, we would restrict iterators from being able to call
runtime.LockOSThread
by panicking if called from an iterator goroutine (the runtime handles the iterator goroutine's creation, and thus can enforce this). Meanwhile, the calling goroutine (the caller ofnext
, that is) is fully free to be locked to an thread.If the iterator is prevented from locking itself to a thread from the iterator goroutine, then it's totally fine for that iterator goroutine to just execute on the locked calling goroutine's thread. That iterator goroutine effectively gets locked to the same thread while its executing, and we know that it will eventually switch back to the calling goroutine. It's fine for the calling goroutine to send
next
to another goroutine as well, even if that other goroutine is locked, because it can only execute once the iterator goroutine has ceded control of the thread (by the semantics and implementation ofiter.Pull
).One hazard of this approach is that the iterator, running on the iterator goroutine, may care about thread-local state, and threads locked to goroutines tend to be locked precisely because there is some important thread-local state that needs to be preserved. However, if the iterator goroutine really cared about thread-local state, it should call
runtime.LockOSThread
, which in this case, will panic. That does mean that the iterator is slightly restricted in what it can do, but we feel that it's better for the code author to find that out rather than deal with some subtle issue related toruntime.LockOSThread
and local thread state not being what they expect.Furthermore, we think it's very unlikely that an iterator will want to lock itself to a thread anyway, so the extra restriction is unlikely to matter to the vast majority of Go users. Simultaneously, making this choice lets us keep
iter.Pull
fast for many more (more realistic) use cases, lets us avoid the pain of maintaining two implementations ofiter.Pull
, and prevents surprising performance cliffs for users ofiter.Pull
.