Closed btracey closed 8 years ago
I wonder if this is related to #14920, try to run your benchmark against d37d3bdcfc429168adac5bf046172fd9c07bfdc2 (the last good commit for me).
The Idamax cases are a different issue, there is still significant regression even at d37d3bd
I'm pretty sure this has nothing to do with #14920. At least, the fix for that issue certainly doesn't apply here. It looks like for ddot, anyway, the additional overhead (compared to c03ed49, about 20 days ago and just after the #14511 fix) looks like an unnecessary nil check and the associated load:
before:
0x0014 00020 (ddot.go:5) LEAQ 8(DX), BP
0x0018 00024 (ddot.go:5) INCQ BX
0x001b 00027 (ddot.go:5) MOVQ (DX), DX
0x001e 00030 (ddot.go:6) ADDQ DX, CX
0x0021 00033 (ddot.go:5) MOVQ BP, DX
0x0024 00036 (ddot.go:5) CMPQ BX, AX
0x0027 00039 (ddot.go:5) JLT $0, 20
after:
0x0014 00020 (ddot.go:5) MOVQ "".a+8(FP), BP
0x0019 00025 (ddot.go:5) TESTB AL, (BP)
0x001c 00028 (ddot.go:5) MOVQ (AX), SI
0x001f 00031 (ddot.go:5) ADDQ $8, AX
0x0023 00035 (ddot.go:5) INCQ BX
0x0026 00038 (ddot.go:6) ADDQ DX, SI
0x0029 00041 (ddot.go:6) MOVQ SI, DX
0x002c 00044 (ddot.go:5) CMPQ BX, CX
0x002f 00047 (ddot.go:5) JLT $0, 20
The nil check in the "before" case can be removed because we have a nil check of &a[i] followed by a load from that pointer. The nil check is effectively merged into the load so you don't see it in the generated assembly.
The nil check in the "after" case isn't being removed because it is checking the original slice ptr &a[0] each time. The load is still from &a[i] so the merging doesn't happen and the nil check must be explicit.
In either case, it is kind of dumb to do nil checks on the pointer loaded from a slice. If the bounds check passes, the pointer is guaranteed to be non-nil. I'll see if I can whip up a fix. It might also be instructive to figure out what changed in the last 20 days about nil checks, I'm not sure what caused it.
I don't see any difference in the inner loop between d37d3bd and tip, so I'm not sure why @btracey you saw performance improve at d37d3bd.
Looks like the culprit is https://go-review.googlesource.com/20307 @brtzsnr It rewrites the nil check to be on &a[0] instead of &a[i]. It doesn't seem to be helping any in this case. (Lifting the nil check out of the loop altogether is an admirable goal, and this rewrite is one step on the way. But without lifting it out of the loop, the rewrite is just hurting us.)
I'll send a CL to remove that rewrite and I'll figure it out how to remove the NilChecks completely. Now that we have use counts should be easier.
CL https://golang.org/cl/21040 mentions this issue.
That change may have fixed Ddot, but it does not fix the Idamax problem. Comparing go1.6 with go version devel +7e88826 Mon Mar 28 14:10:21 2016 +0000 darwin/amd64
IdamaxSmallUnitaryInc-8 32.4ns ± 7% 39.1ns ± 6% +20.80% (p=0.008 n=5+5)
IdamaxSmallPosInc-8 28.7ns ±11% 41.1ns ±11% +43.24% (p=0.008 n=5+5)
IdamaxMediumUnitaryInc-8 1.58µs ± 2% 2.03µs ± 2% +27.83% (p=0.008 n=5+5)
IdamaxMediumPosInc-8 1.86µs ± 2% 2.38µs ±11% +27.93% (p=0.008 n=5+5)
IdamaxLargeUnitaryInc-8 150µs ± 2% 195µs ± 1% +30.23% (p=0.008 n=5+5)
IdamaxLargePosInc-8 202µs ± 1% 241µs ± 2% +19.10% (p=0.008 n=5+5)
IdamaxHugeUnitaryInc-8 15.1ms ± 1% 21.0ms ± 2% +39.67% (p=0.008 n=5+5)
IdamaxHugePosInc-8 27.9ms ± 3% 30.0ms ± 2% +7.38% (p=0.008 n=5+5)
Should I open a different issue?
Sorry about closing this prematurely. But yes, please open a different issue. There are two different underlying problems so it would be better to have them in different issues.
No problem. Do you have a preferred title?
cmd/compile: BLAS Idamax regression
Idamax and Ddot small are showing regressions between 1.6 and gotip.
Reproducing:
While the Ddot regressions look small, there had been improvements as of the resolution of #14511 , so it's a significant regression since then. Ddot shows improvement between 1.6 and tip for the benchmarks with larger vector sizes (slices of size 100+ instead of 10).
/cc @randall77 @tzneal @dr2chase @josharian @brtzsnr