golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
121.02k stars 17.36k forks source link

cmd/compile: double zeroing and unnecessary copying/stack use #67957

Open dominikh opened 2 weeks ago

dominikh commented 2 weeks ago

Go version

go version devel go1.23-722d59436b Thu May 16 02:00:26 2024 +0000 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/dominikh/.cache/go-build'
GOENV='/home/dominikh/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/dominikh/prj/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/home/dominikh/prj'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/home/dominikh/prj/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/home/dominikh/prj/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='devel go1.23-722d59436b Thu May 16 02:00:26 2024 +0000'
GODEBUG=''
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/dominikh/prj/src/example.com/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build1152336647=/tmp/go-build -gno-record-gcc-switches'

What did you do?

go build -gcflags=-S the code at https://play.golang.org/p/tt0T6oOsQvr.go and look at the assembly generated for main

What did you see happen?

main.main STEXT nosplit size=195 args=0x0 locals=0x50 funcid=0x0 align=0x0
        0x0000 00000 (foo.go:47)     TEXT    main.main(SB), NOSPLIT|ABIInternal, $80-0
        0x0000 00000 (foo.go:47)     PUSHQ   BP
        0x0001 00001 (foo.go:47)     MOVQ    SP, BP
        0x0004 00004 (foo.go:47)     SUBQ    $72, SP
        0x0008 00008 (foo.go:47)     FUNCDATA        $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        0x0008 00008 (foo.go:47)     FUNCDATA        $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
        0x0008 00008 (foo.go:48)     MOVUPS  X15, main.~r0(SP)
        0x000d 00013 (foo.go:48)     MOVUPS  X15, main.~r0+8(SP)
        0x0013 00019 (foo.go:48)     MOVUPS  X15, main.~r0+24(SP)
        0x0019 00025 (foo.go:48)     MOVUPS  X15, main.~r0+40(SP)
        0x001f 00031 (foo.go:48)     MOVUPS  X15, main.~r0+56(SP)
        0x0025 00037 (<unknown line number>)    NOP
        0x0025 00037 (foo.go:48)     MOVUPS  X15, main.~r0(SP)
        0x002a 00042 (foo.go:48)     MOVUPS  X15, main.~r0+8(SP)
        0x0030 00048 (foo.go:48)     MOVUPS  X15, main.~r0+24(SP)
        0x0036 00054 (foo.go:48)     MOVUPS  X15, main.~r0+40(SP)
        0x003c 00060 (foo.go:48)     MOVUPS  X15, main.~r0+56(SP)
        0x0042 00066 (foo.go:37)     MOVQ    $1, main.~r0(SP)
        0x004a 00074 (foo.go:37)     MOVSD   $f64.4000000000000000(SB), X0
        0x0052 00082 (foo.go:37)     MOVSD   X0, main.~r0+8(SP)
        0x0058 00088 (foo.go:37)     MOVSD   $f64.4008000000000000(SB), X0
        0x0060 00096 (foo.go:37)     MOVSD   X0, main.~r0+16(SP)
        0x0066 00102 (foo.go:37)     MOVSD   $f64.4010000000000000(SB), X0
        0x006e 00110 (foo.go:37)     MOVSD   X0, main.~r0+24(SP)
        0x0074 00116 (foo.go:37)     MOVSD   $f64.4014000000000000(SB), X0
        0x007c 00124 (foo.go:37)     MOVSD   X0, main.~r0+32(SP)
        0x0082 00130 (foo.go:48)     MOVQ    main.~r0(SP), AX
        0x0086 00134 (foo.go:48)     MOVQ    AX, main.Sink(SB)
        0x008d 00141 (foo.go:48)     MOVUPS  main.~r0+8(SP), X0
        0x0092 00146 (foo.go:48)     MOVUPS  X0, main.Sink+8(SB)
        0x0099 00153 (foo.go:48)     MOVUPS  main.~r0+24(SP), X0
        0x009e 00158 (foo.go:48)     MOVUPS  X0, main.Sink+24(SB)
        0x00a5 00165 (foo.go:48)     MOVUPS  main.~r0+40(SP), X0
        0x00aa 00170 (foo.go:48)     MOVUPS  X0, main.Sink+40(SB)
        0x00b1 00177 (foo.go:48)     MOVUPS  main.~r0+56(SP), X0
        0x00b6 00182 (foo.go:48)     MOVUPS  X0, main.Sink+56(SB)
        0x00bd 00189 (foo.go:49)     ADDQ    $72, SP
        0x00c1 00193 (foo.go:49)     POPQ    BP
        0x00c2 00194 (foo.go:49)     RET

What did you expect to see?

At a minimum, we shouldn't zero the same memory twice. Ideally, we'd zero Sink and store constants to it, not using any stack.

/cc @golang/compiler

gabyhelp commented 2 weeks ago

Similar Issues

(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)

randall77 commented 2 weeks ago

Hmm, I do not get the same generated code as you do. At 722d59436b, right? In many ways, the generated code I'm seeing is worse. The extra zeroing is not there, so that's good. But the type switch in main has not been constant-propagated away, which yours apparently has. As a result my main is quite a bit longer than the result you showed (522 bytes, vs your code which is only 195 bytes).

Could you double-check the code you pasted and the Go stdlib git commit hash?

dominikh commented 2 weeks ago

I've double-checked (and rebuild). I'm definitely on 722d59436bc5881914619d2b95c9d01a46036428, with a clean working directory. And I can reproduce it by downloading the code I shared.

I've also tried with a newer master, ca5ba146da7a9d4e2a8cbe1715a78be42b45a745, and am getting the same output. GODEBUG, GOEXPERIMENT, and GOAMD64 are all unset.

chulak ~ ^$ mkdir meow2
chulak ~ ^$ cd meow2
chulak ~/meow2 ^$ go version
go version devel go1.23-ca5ba146da Wed Jun 12 20:40:04 2024 +0000 linux/amd64
chulak ~/meow2 ^$ wget -q https://play.golang.org/p/tt0T6oOsQvr.go  
chulak ~/meow2 ^$ go build -gcflags=-S tt0T6oOsQvr.go |& head -40
# command-line-arguments
main.main STEXT nosplit size=195 args=0x0 locals=0x50 funcid=0x0 align=0x0
    0x0000 00000 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)   TEXT    main.main(SB), NOSPLIT|ABIInternal, $80-0
    0x0000 00000 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)   PUSHQ   BP
    0x0001 00001 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)   MOVQ    SP, BP
    0x0004 00004 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)   SUBQ    $72, SP
    0x0008 00008 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)   FUNCDATA    $0, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
    0x0008 00008 (/home/dominikh/meow2/tt0T6oOsQvr.go:47)   FUNCDATA    $1, gclocals·g2BeySu+wFnoycgXfElmcg==(SB)
    0x0008 00008 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0(SP)
    0x000d 00013 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0+8(SP)
    0x0013 00019 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0+24(SP)
    0x0019 00025 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0+40(SP)
    0x001f 00031 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0+56(SP)
    0x0025 00037 (<unknown line number>)    NOP
    0x0025 00037 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0(SP)
    0x002a 00042 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0+8(SP)
    0x0030 00048 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0+24(SP)
    0x0036 00054 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0+40(SP)
    0x003c 00060 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X15, main.~r0+56(SP)
    0x0042 00066 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)   MOVQ    $1, main.~r0(SP)
    0x004a 00074 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)   MOVSD   $f64.4000000000000000(SB), X0
    0x0052 00082 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)   MOVSD   X0, main.~r0+8(SP)
    0x0058 00088 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)   MOVSD   $f64.4008000000000000(SB), X0
    0x0060 00096 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)   MOVSD   X0, main.~r0+16(SP)
    0x0066 00102 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)   MOVSD   $f64.4010000000000000(SB), X0
    0x006e 00110 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)   MOVSD   X0, main.~r0+24(SP)
    0x0074 00116 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)   MOVSD   $f64.4014000000000000(SB), X0
    0x007c 00124 (/home/dominikh/meow2/tt0T6oOsQvr.go:37)   MOVSD   X0, main.~r0+32(SP)
    0x0082 00130 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVQ    main.~r0(SP), AX
    0x0086 00134 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVQ    AX, main.Sink(SB)
    0x008d 00141 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  main.~r0+8(SP), X0
    0x0092 00146 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X0, main.Sink+8(SB)
    0x0099 00153 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  main.~r0+24(SP), X0
    0x009e 00158 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X0, main.Sink+24(SB)
    0x00a5 00165 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  main.~r0+40(SP), X0
    0x00aa 00170 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X0, main.Sink+40(SB)
    0x00b1 00177 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  main.~r0+56(SP), X0
    0x00b6 00182 (/home/dominikh/meow2/tt0T6oOsQvr.go:48)   MOVUPS  X0, main.Sink+56(SB)
    0x00bd 00189 (/home/dominikh/meow2/tt0T6oOsQvr.go:49)   ADDQ    $72, SP
    0x00c1 00193 (/home/dominikh/meow2/tt0T6oOsQvr.go:49)   POPQ    BP
randall77 commented 2 weeks ago

Ah, for some reason go build -gcflags=-S and go tool compile -S are resulting in different assembly output. I will look further later.

dominikh commented 2 weeks ago

Passing -p main (or any other value for -p) to go tool compile seems to produce output identical to go build.

randall77 commented 2 weeks ago

Ok, a bunch of things here.

  1. -p main matters because somehow that makes the dictionary entry constant folding happen, whereas without that the compiler doesn't realize the type switch is deadcodeable. Annoying, but might not be worth fixing as it only applies to raw tool compile use.
  2. The double zeroing is just the dead store elim pass being confused by an inline mark. Will fix.
  3. The storing of float constants could be better on amd64, also have a small CL for that.
  4. The reason why we materialize on the stack and then copy is somewhat subtle. Consider this code
type T struct{ a, b, c, d, e int }

func f(t *T, z int) {
    *t = T{1, z, z, z, z}
}

This code makes a temporary T on the stack, initializes it, then copies it to the destination. Replace the 1 with a 0, and it writes everything to the destination directly.

The reason this happens is that we have rules (starting at cmd/compile/internal/ssa/_gen/generic.rules, line 2397) that break a Move down into individual stores if the source involved 4 or fewer stores. When the first field is a 0, its store gets eliminated because it is storing to a freshly zeroed object. So there are only 4 stores remaining and the rule kicks in. When the first field is a 1, there are 5 stores and the Move persists. When the Move is broken down into individual stores then further optimizations (store->load forwarding, I think) eventually eliminate the temporary altogether.

I don't see anything obvious to fix for number 4. We could change the constant, sure, but the rule description says, and I mostly agree, that we don't want to do this for arbitrary sized objects because we'd end up having to spill/restore all its individual components anyway.

gopherbot commented 2 weeks ago

Change https://go.dev/cl/592596 mentions this issue: cmd/compile: store constant floats using integer constants

gopherbot commented 2 weeks ago

Change https://go.dev/cl/592615 mentions this issue: cmd/compile: don't treat an InlMark as a read during deadstore

dominikh commented 2 weeks ago

Regarding 4 and "we'd end up having to spill/restore all its individual components anyway", maybe the count of stores shouldn't include constants being stored?

randall77 commented 2 weeks ago

@dominikh Sure, that would be an improvement. Maybe difficult to do though given how the rules are currently structured.