golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.08k stars 17.55k forks source link

cmd/vet: check for failure to use result of a 'yield' call #65795

Open adonovan opened 7 months ago

adonovan commented 7 months ago

Go 1.23 introduces "push" iterators, also known as "range over func", in which the compiler transforms the loop body (consumer) into a stackless coroutine yield func(T) bool for some element type T, and then passes it to the push iterator (producer). The producer calls yield for each element, and must stop if any call to yield returns false, indicating that the desired continuation of the loop body is not continue (e.g. break, outer continue, return, panic, goto).

It is a programmer error to fail to honor the result of any call to yield. The runtime reports a dynamic error in this case, but it would be nice to catch it earlier with a static check.

We should add a new analyzer (or perhaps augment the existing unusedresult analyzer) to report when the result of a dynamic call to a function named yield (with type func(T) bool for some T) is ignored.

(Update: for a second yield mistake that vet could check for, see https://github.com/golang/go/issues/65795#issuecomment-2319078089.)

robpike commented 7 months ago

There are three conditions for vet checks, as listed in its README. We don't know yet whether this is a large enough problem in the wild to satisfy the frequency criterion.

adonovan commented 7 months ago

There are three conditions for vet checks, as listed in its README. We don't know yet whether this is a large enough problem in the wild to satisfy the frequency criterion.

It is certainly not a large problem in the wild because no-one has merged code with a push iterator... yet; but I've already made this mistake several times in my experiments. Nonetheless your point is taken.

timothy-king commented 7 months ago

We should add a new analyzer (or perhaps augment the existing unusedresult analyzer) to report when the result of a dynamic call to a function named yield (with type func(T) bool for some T) is ignored.

I am not sure this will be sufficiently precise. For GoVersion<=1.21, one could already have written such a function and ignore the return value of the function named 'yield'. Thatcould never have been used in a "range over func" statement, and would have been fine.

I think we will be able to get sufficiently good precision by switching this to looking at instances of for range f where f is a function type, f(yield) is a static call, and f ignores a call to its yield function. A big disadvantage of this approach though is where to report the diagnostic. Within the body of f is more actionable, but this will not always be doable if we are looking at the package containing the for range f statement.

adonovan commented 7 months ago

I am not sure this will be sufficiently precise. For GoVersion<=1.21, one could already have written such a function and ignore the return value of the function named 'yield'. Thatcould never have been used in a "range over func" statement, and would have been fine.

You're absolutely right, it's possible it is just a coincidence, but I suspect the combination of calling a function whose operand is a local variable named 'yield', with the appropriate type, is pretty rare today, and we could (with a new convention) practically carve out the name as an informally reserved word for the new loop semantics.

timothy-king commented 7 months ago

I suspect the combination of calling a function whose operand is a local variable named 'yield', with the appropriate type, is pretty rare today

Fair enough.

we could (with a new convention) practically carve out the name as an informally reserved word for the new loop semantics.

It makes me a bit nervous to only check when this is called 'yield'. This is new and conventions are somewhat weak. What is special about 'yield'? Why not 'y' or 'f'? How about 'yield' and 'yield2' if there is more than one? I am personally very prone to typing 'yeild' incorrectly.

I am somewhat more comfortable about treating 'yield' specially if we have a backup plan for warning the other cases too. This falls under "a check that misses too many of the cases it's looking for will give a false sense of security" IMO. Perhaps report 'yield' instances at the missed check regardless of being called, report intra-package calls at the missed check, and report at the call site for inter-package instances (https://github.com/golang/go/issues/65795#issuecomment-1955171715). This is a bit complicated, but still doable.

Deleplace commented 3 weeks ago

I 100% agree that ignoring the result of yield() is too easy to do accidentally, in good faith. I've been bitten myself.

Part of the problem is that the user never writes the body of yield, as yield is automagically provided as an argument. In many use cases the termination cause we have in mind is when the iter.Seq or iter.Seq2 decides to stop producing values, and we're not thinking about the other cause "the consumer doesn't want any more values". It was a mild surprise for me to discover that the keyword break in the consuming loop would cause yield() to return false (as admittedly stated in the spec).

Ideally the vet check would trigger only for iterator functions that are actually ranged over in at least 1 place in the codebase, but that may be too costly to implement.

adonovan commented 3 weeks ago

Another yield-related mistake is to write:

for x := range seq {
   yield(x)
}

when seq(yield) is more efficient, since it avoids back-to-back dynamic calls and the complex range desugaring. Perhaps vet could check for this too.

Deleplace commented 3 weeks ago
for x := range seq {
   yield(x)
}

The worst problem here is that it is incorrect, right? And indeed, not very efficient.

adonovan commented 3 weeks ago

The worst problem here is that it is incorrect, right? And indeed, not very efficient.

This is mainly an efficiency problem. But if this code appears within an iterator of a recursive data structure (e.g. a tree) and this is the inductive case (seq is a subtree), then it could turn a linear-time iteration into an O(n log n) operation because every element would make a chain of calls proportional to its depth (log n).

Deleplace commented 3 weeks ago

I meant that it would lead to a panic, because of ignoring the result of yield(x)

adonovan commented 3 weeks ago

I meant that it would lead to a panic, because of ignoring the result of yield(x)

Sorry, I completely failed to notice that my for x := range seq { yield(x) } example committed two mistakes, the first being the title of this issue! Which perhaps shows how easy a mistake it is to make.

I meant to write:

for x := range seq {
  if !yield(x) { break }
}
gopherbot commented 3 weeks ago

Change https://go.dev/cl/609617 mentions this issue: gopls/internal/analysis/yield: analyzer for iterator (yield) mistakes