Open coxley opened 1 month ago
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
I had hoped "cmd/compile: fix wrong esacpe analysis for rangefunc" would fix it, but it appears the composed version has the same cost in both go 1.23.1
and master
.
> go version
go version go1.23.1 darwin/arm64
> go build -gcflags='-m=3' main.go 2>&1 | grep 'inline Collection\[go.shape.int\].Values'
./main.go:59:6: can inline Collection[go.shape.int].Values with cost 17 as: method(Collection[go.shape.int]) func(*[8]uintptr) iter.Seq[go.shape.int] { return func literal }
./main.go:60:9: cannot inline Collection[go.shape.int].Values.func1: function too complex: cost 189 exceeds budget 80
./main.go:61:3: cannot inline Collection[go.shape.int].Values.func1-range1: function too complex: cost 142 exceeds budget 80
./main.go:46:9: can inline Collection[go.shape.int].Values.func1.Slice[go.shape.int].All.1 with cost 72 as: func(func(int, go.shape.int) bool) { for loop }
> /Users/coxley/projects/go/bin/go version
go version devel go1.24-165bf24 2024-09-19 00:40:50 +0000 darwin/arm64
> /Users/coxley/projects/go/bin/go build -gcflags='-m=3' main.go 2>&1 | grep 'inline Collection\[go.shape.int\].Values'
./main.go:59:6: can inline Collection[go.shape.int].Values with cost 17 as: method(Collection[go.shape.int]) func(*[8]uintptr) iter.Seq[go.shape.int] { return func literal }
./main.go:60:9: cannot inline Collection[go.shape.int].Values.func1: function too complex: cost 189 exceeds budget 80
./main.go:61:3: cannot inline Collection[go.shape.int].Values.func1-range1: function too complex: cost 142 exceeds budget 80
./main.go:46:9: can inline Collection[go.shape.int].Values.func1.Slice[go.shape.int].All.1 with cost 72 as: func(func(int, go.shape.int) bool) { for loop }
Moving the underlying iterator out of the closure seems to remove the allocations but it is still a lot slower.
func (s Collection[T]) Values() iter.Seq[T] {
underlyingIter := s.underlying.All()
return func(yield func(T) bool) {
for _, v := range underlyingIter {
if !yield(v) {
return
}
}
}
}
$ go run main.go
Direct: 5 200.0 ns/op 0 B/op 0 allocs/op
Nested: 5 5033 ns/op 0 B/op 0 allocs/op
I had hoped "cmd/compile: fix wrong esacpe analysis for rangefunc" would fix it, but it appears the composed version has the same cost in both go 1.23.1 and master.
The fix is about wrong analysis of escape analysis, so it won't change the inlining cost.
Not entirely sure what's gone wrong here -- when compiled "-m", it helpfully reports that it can inline all the things with recent closure-inline-enhancing CLs applied, but it is not faster.
Correction -- it is NOT inlining all the things.
Here's another benchmark -- if nothing else, assigning to "_" in the loop body gets optimized away. I'd also rather write these as go-test benchmarks, which reduces the need to check to see if something non-standard is going on.
package foo_test
import (
"iter"
"testing"
)
var u int
func BenchmarkDirect(b *testing.B) {
s := Slice[int](make([]int, 64))
for range b.N {
for _, v := range s.All() {
u = v
}
}
}
func BenchmarkNestedA(b *testing.B) {
collection := Collection[int]{Slice[int](make([]int, 64))}
for range b.N {
for v := range collection.ValuesA() {
u = v
}
}
}
func BenchmarkNestedB(b *testing.B) {
collection := Collection[int]{Slice[int](make([]int, 64))}
for range b.N {
for v := range collection.ValuesB() {
u = v
}
}
}
func BenchmarkNestedACall(b *testing.B) {
collection := Collection[int]{Slice[int](make([]int, 64))}
yield := func(i int) bool {
u = i
return true
}
for range b.N {
collection.ValuesA()(yield)
}
}
func BenchmarkNestedBCall(b *testing.B) {
collection := Collection[int]{Slice[int](make([]int, 64))}
yield := func(i int) bool {
u = i
return true
}
for range b.N {
collection.ValuesB()(yield)
}
}
type Slice[T any] []T
func (s Slice[T]) All() iter.Seq2[int, T] {
return func(yield func(int, T) bool) {
for i, v := range s {
if !yield(i, v) {
return
}
}
}
}
type Collection[T any] struct {
underlying Slice[T]
}
func (s Collection[T]) ValuesA() iter.Seq[T] {
return func(yield func(T) bool) {
for _, v := range s.underlying.All() {
if !yield(v) {
return
}
}
}
}
func (s Collection[T]) ValuesB() iter.Seq[T] {
underlyingIter := s.underlying.All()
return func(yield func(T) bool) {
for _, v := range underlyingIter {
if !yield(v) {
return
}
}
}
}
Change https://go.dev/cl/622415 mentions this issue: cmd/compile: spell "go.runtime" correctly for inline "cheap" test
@dr2chase Sorry for the non-standard benchmark — only did that so it was viewable in go.dev/play
.
This is a dupe of #69411 -- same bug, same root cause. The inliner needs to get better at closures.
Go version
go 1.23 / go1.24-fc968d4 / go1.24-165bf24
Output of
go env
in your module/workspace:What did you do?
Runnable benchmark: https://go.dev/play/p/1J9H8AL2aGM
Created two types:
This simulates an actual use-case I have: an AVL tree being embedded in a graph data structure. Iterating from a higher-order type should be as performant as iterating from the foundational one.
When moving away from callback-style iteration (
avl.Each(func(key int64, val *Edge) { /* ... */ })
) to idiomatic iterators, I noticed a non-trivial performance regression. Granted, some of this iteration is in a tight-loop so maybe it isn't affecting every program, but after digging I noticed that higher-order iterators' func literals escape to the heap while the lowest-order one's do not.The benchmark link above demonstrates. I'm not sure if this is #66469, #69015, or simply related. But the first issue didn't get any traction which was before the 1.23 release.
What did you see happen?
iter.Seq
style iterator performs 0 allocations.iter.Seq
style iterator on that dispatches to the other incurs allocations — 6 in the go.dev/play example.What did you expect to see?
Equivalent performance of both.
The only current option is to duplicate iteration logic instead of composing iterators.