golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.12k stars 17.68k forks source link

cmd/compile: needlessly early return value loads in runtime.scanobject #19195

Open josharian opened 7 years ago

josharian commented 7 years ago

runtime.scanobject contains this bit of code:

// Mark the object.
if obj, hbits, span, objIndex := heapBitsForObject(obj, b, i); obj != 0 {
    greyobject(obj, b, i, hbits, span, gcw, objIndex)
}

This compiles to (excerpt):

    0x024b 00587 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   MOVQ    R11, (SP)
    0x024f 00591 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   MOVQ    BX, 8(SP)
    0x0254 00596 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   MOVQ    R8, 16(SP)
    0x0259 00601 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   PCDATA  $0, $3
    0x0259 00601 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   CALL    "".heapBitsForObject(SB)
    0x025e 00606 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   MOVL    40(SP), AX
    0x0262 00610 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   MOVQ    24(SP), CX
    0x0267 00615 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   MOVQ    48(SP), DX
    0x026c 00620 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   MOVQ    56(SP), BX
    0x0271 00625 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   MOVQ    32(SP), SI
    0x0276 00630 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   TESTQ   CX, CX
    0x0279 00633 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   JNE $0, 683
    0x027b 00635 (/Users/josh/go/tip/src/runtime/mgcmark.go:1178)   MOVQ    "".arena_used+96(SP), AX
    0x0280 00640 (/Users/josh/go/tip/src/runtime/mgcmark.go:1160)   MOVL    "".h.shift+68(SP), CX
    0x0284 00644 (/Users/josh/go/tip/src/runtime/mgcmark.go:1178)   MOVQ    ""..autotmp_2642+112(SP), DX
    0x0289 00649 (/Users/josh/go/tip/src/runtime/mgcmark.go:1174)   MOVQ    "".b+168(FP), BX
    0x0291 00657 (/Users/josh/go/tip/src/runtime/mgcmark.go:1160)   MOVQ    "".h.bitp+128(SP), SI
    0x0299 00665 (/Users/josh/go/tip/src/runtime/mgcmark.go:1153)   MOVQ    "".n+80(SP), DI
    0x029e 00670 (/Users/josh/go/tip/src/runtime/mgcmark.go:1153)   MOVQ    "".i+88(SP), R8
    0x02a3 00675 (/Users/josh/go/tip/src/runtime/mgcmark.go:1160)   MOVL    CX, R9
    0x02a6 00678 (/Users/josh/go/tip/src/runtime/mgcmark.go:1178)   JMP 548
    0x02ab 00683 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    CX, (SP)
    0x02af 00687 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    "".b+168(FP), CX
    0x02b7 00695 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    CX, 8(SP)
    0x02bc 00700 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    "".i+88(SP), DI
    0x02c1 00705 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    DI, 16(SP)
    0x02c6 00710 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    SI, 24(SP)
    0x02cb 00715 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVL    AX, 32(SP)
    0x02cf 00719 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    DX, 40(SP)
    0x02d4 00724 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    "".gcw+176(FP), AX
    0x02dc 00732 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    AX, 48(SP)
    0x02e1 00737 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   MOVQ    BX, 56(SP)
    0x02e6 00742 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   PCDATA  $0, $3
    0x02e6 00742 (/Users/josh/go/tip/src/runtime/mgcmark.go:1181)   CALL    "".greyobject(SB)
    0x02eb 00747 (/Users/josh/go/tip/src/runtime/mgcmark.go:1180)   JMP 635

The loads at instructions 0x025e, 0x0267, 0x026c, and 0x0271 are too early. They're loading return values from heapBitsForObject, but if the call to greyobject isn't necessary (if the jump at 0x0279 isn't taken), their values are unnecessary and will be overwritten. This is easy enough to see in the original code as well.

The loads are scheduled where they are due to memory ordering. But maybe there's something we can do to improve the situation, perhaps using the knowledge that stack slots and function return values are always disjoint in memory?

cc @randall77 @dr2chase @cherrymui

gopherbot commented 7 years ago

CL https://golang.org/cl/37300 mentions this issue.

philhofer commented 7 years ago

I've been doing some work on this front. Branch is here: https://github.com/philhofer/go/tree/store-forward

On that branch, that particular bit of code compiles to:

b54 29841 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1183)   JCS $1, 29828
v235    29842 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1158)   MOVQ    R8, "".i-64(SP)
v224    29843 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1165)   MOVL    CX, ""..autotmp_2885-84(SP)
v365    29844 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1165)   MOVQ    SI, "".h.bitp-24(SP)
v270    29845 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1158)   MOVQ    DI, "".n-72(SP)
v286    29846 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   MOVQ    R11, (SP)
v288    29847 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   MOVQ    BX, 8(SP)
v291    29848 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   MOVQ    R8, 16(SP)
v292    29849 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   CALL    "".heapBitsForObject(SB)
v294    29850 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   MOVQ    24(SP), AX
v146    29851 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   TESTQ   AX, AX
b52 29852 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   JNE $0, 29862
v335    29853 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1183)   MOVQ    "".arena_start-48(SP), AX
v326    29854 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1165)   MOVL    ""..autotmp_2885-84(SP), CX
v395    29855 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1183)   MOVQ    "".arena_used-56(SP), DX
v390    29856 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1179)   MOVQ    "".b(SP), BX
v328    29857 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1165)   MOVQ    "".h.bitp-24(SP), SI
v55 29858 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1158)   MOVQ    "".n-72(SP), DI
v208    29859 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1158)   MOVQ    "".i-64(SP), R8
v222    29860 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1165)   MOVL    CX, R9
b58 29861 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1183)   JMP 29828
v304    29862 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    AX, (SP)
v408    29863 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    "".b(SP), AX
v306    29864 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    AX, 8(SP)
v424    29865 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    "".i-64(SP), CX
v308    29866 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    CX, 16(SP)
v321    29867 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   MOVQ    32(SP), DX
v205    29868 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    DX, 24(SP)
v392    29869 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   MOVL    40(SP), DX
v311    29870 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVL    DX, 32(SP)
v298    29871 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   MOVQ    48(SP), DX
v314    29872 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    DX, 40(SP)
v237    29873 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    "".gcw+8(SP), DX
v317    29874 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    DX, 48(SP)
v300    29875 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   MOVQ    56(SP), BX
v319    29876 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   MOVQ    BX, 56(SP)
v320    29877 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1186)   CALL    "".greyobject(SB)
b57 29878 (/Users/philiphofer/go-tip/src/runtime/mgcmark.go:1185)   JMP 29853

Alias analysis plus tighten plus a load-sinking pass is clever enough to push the loads down until they absolutely need to be loaded. In this particular example the shuffling within the destination basic block doesn't help much, but it appears to help most go programs. Geomean improvement in go1bench is about 6% right now.

gopherbot commented 7 years ago

CL https://golang.org/cl/38448 mentions this issue.