cmd/compile: better append of unmodified slices

sylr commented 4 years ago

What version of Go are you using (`go version`)?

$ go version
go version go1.14 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (`go env`)?

go env Output

$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN="/Users/s.rabot/go/bin"
GOCACHE="/Users/s.rabot/Library/Caches/go-build"
GOENV="/Users/s.rabot/Library/Application Support/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOINSECURE=""
GONOPROXY=""
GONOSUMDB=""
GOOS="darwin"
GOPATH="/Users/s.rabot/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
AR="ar"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/z0/rcywz4nn3jg6g96h67c_4xzdw31lvl/T/go-build757609694=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you do?

package main

import (
    "testing"
)

func BenchmarkNoCap(b *testing.B) {
    for i := 0; i < b.N; i++ {
        var slice []string

        slice = append(slice, []string{"a", "q", "w"}...)
        slice = append(slice, []string{"z", "s", "x"}...)
    }
}

func BenchmarkCap(b *testing.B) {
    for i := 0; i < b.N; i++ {
        slice := make([]string, 0, 6)

        slice = append(slice, []string{"a", "q", "w"}...)
        slice = append(slice, []string{"z", "s", "x"}...)
    }
}

func BenchmarkCapNoMagicNumber(b *testing.B) {
    for i := 0; i < b.N; i++ {
        base := []string{"a", "q", "w"}
        fixed := []string{"z", "s", "x"}

        slice := make([]string, 0, len(base)+len(fixed))

        slice = append(slice, base...)
        slice = append(slice, fixed...)
    }
}

func BenchmarkCapNoMagicNumberFixedSlices(b *testing.B) {
    for i := 0; i < b.N; i++ {
        base := [...]string{"a", "q", "w"}
        fixed := [...]string{"z", "s", "x"}

        slice := make([]string, 0, len(base)+len(fixed))

        slice = append(slice, base[:]...)
        slice = append(slice, fixed[:]...)
    }
}

What did you expect to see?

I would have expected BenchmarkCapNoMagicNumber and BenchmarkCapNoMagicNumberFixedSlices to have the same throughput.

What did you see instead?

$ go test -bench . -count=3
goos: darwin
goarch: amd64
BenchmarkNoCap-12                            7771320           142 ns/op
BenchmarkNoCap-12                            7877163           142 ns/op
BenchmarkNoCap-12                            7857156           143 ns/op

BenchmarkCap-12                             59972187            20.2 ns/op
BenchmarkCap-12                             58082768            23.4 ns/op
BenchmarkCap-12                             44653009            25.6 ns/op

BenchmarkCapNoMagicNumber-12                15412473            74.8 ns/op
BenchmarkCapNoMagicNumber-12                14540749            75.1 ns/op
BenchmarkCapNoMagicNumber-12                15240964            74.7 ns/op

BenchmarkCapNoMagicNumberFixedSlices-12     64831053            21.0 ns/op
BenchmarkCapNoMagicNumberFixedSlices-12     64266398            21.4 ns/op
BenchmarkCapNoMagicNumberFixedSlices-12     60569865            22.7 ns/op

bcmills commented 4 years ago

@randall77, I think you've got the title backwards. It looks like the sliced array is outperforming the slice-literal version for some reason.

randall77 commented 4 years ago

Sorry, yes.

renthraysk commented 4 years ago

Slow routine allocates because len(base)+len(fixed) of slices isn't determined at compile time, whereas with arrays the expression is constant.

josharian commented 4 years ago

Or to be more precise, during the escape analysis phase, len(base)+len(fixed) isn't known to be a constant. (Later on, SSA figures it out.)

sylr commented 4 years ago

So both are optimized to the extent of what they can be or is there room for improvement in the slice case ?

josharian commented 4 years ago

It’s definitely possible for them to be improved. The question—which I can’t answer offhand—is whether it is possible to make a small, local improvement that would cover this case, or whether it would require an overhaul. (There are good independent reasons for an overhaul, but what is needed—eliminating one of the compiler’s intermediate representations—is a huge undertaking.)

bcmills commented 4 years ago

Is this a duplicate of #20533?

golang / go