Closed btracey closed 8 years ago
Can you provide assembly dumps from tip and from ssa for some of the biggest slowdowns and/or simplest functions, as well as the corresponding function? That'd make digging into this a bit easier.
/cc @randall77 @tzneal @dr2chase @brtzsnr
Note that while the timeout was listed at 60m, the benchmarks really only take slightly greater than 10m
A smaller reproducer is the Ddot benchmark Code: https://github.com/gonum/blas/blob/master/native/level1double_ddot.go Benchmark(s): https://github.com/gonum/blas/blob/master/native/level1doubleBench_auto_test.go#L38
brendan:~/Documents/mygo$ benchstat blasonesixddot.txt ssatipddot.txt
name old time/op new time/op delta
DdotSmallBothUnitary 17.3ns ± 2% 26.6ns ± 6% +53.46% (p=0.008 n=5+5)
DdotSmallIncUni 21.4ns ± 1% 31.8ns ± 8% +48.55% (p=0.008 n=5+5)
DdotSmallUniInc 23.2ns ±10% 30.4ns ± 8% +31.15% (p=0.008 n=5+5)
DdotSmallBothInc 22.7ns ±12% 29.0ns ± 3% +27.53% (p=0.008 n=5+5)
DdotMediumBothUnitary 1.01µs ± 3% 2.68µs ± 8% +165.29% (p=0.008 n=5+5)
DdotMediumIncUni 1.38µs ± 2% 2.68µs ± 3% +93.62% (p=0.008 n=5+5)
DdotMediumUniInc 1.18µs ± 3% 2.56µs ± 3% +116.45% (p=0.008 n=5+5)
DdotMediumBothInc 1.32µs ± 8% 2.66µs ± 3% +101.93% (p=0.008 n=5+5)
DdotLargeBothUnitary 88.6µs ± 8% 262.6µs ± 2% +196.36% (p=0.008 n=5+5)
DdotLargeIncUni 169µs ± 1% 278µs ± 2% +64.51% (p=0.008 n=5+5)
DdotLargeUniInc 121µs ± 1% 252µs ± 1% +108.23% (p=0.008 n=5+5)
DdotLargeBothInc 237µs ± 0% 304µs ± 9% +28.25% (p=0.008 n=5+5)
DdotHugeBothUnitary 10.6ms ± 0% 27.6ms ± 3% +161.88% (p=0.016 n=4+5)
DdotHugeIncUni 25.8ms ± 2% 31.7ms ± 5% +22.78% (p=0.008 n=5+5)
DdotHugeUniInc 17.6ms ± 1% 28.7ms ± 4% +63.27% (p=0.008 n=5+5)
DdotHugeBothInc 32.9ms ± 0% 35.8ms ± 7% +8.83% (p=0.008 n=5+5)
I'm not very experienced with assembler, but I think these are the most relevant outputs. It may be that the real dump needs to be done in the actual blas package, rather than just the inner loop.
brendan:~/Documents/mygo/src/github.com/gonum/internal/asm$ go version
go version go1.6 darwin/amd64
brendan:~/Documents/mygo/src/github.com/gonum/internal/asm$ go build -gcflags=-S -tags noasm ddot.go
# command-line-arguments
"".DdotUnitary t=1 size=128 value=0 args=0x38 locals=0x0
0x0000 00000 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) TEXT "".DdotUnitary(SB), $0-56
0x0000 00000 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) MOVQ (TLS), CX
0x0009 00009 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) CMPQ SP, 16(CX)
0x000d 00013 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) JLS 111
0x000f 00015 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) NOP
0x000f 00015 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) NOP
0x000f 00015 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) MOVQ "".y+32(FP), R9
0x0014 00020 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) MOVQ "".y+40(FP), DI
0x0019 00025 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) FUNCDATA $0, gclocals·71f75e7e2fe2878e818867fe3428bd87(SB)
0x0019 00025 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0019 00025 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) XORPS X3, X3
0x001c 00028 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) MOVSD X3, "".sum+56(FP)
0x0022 00034 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) NOP
0x0022 00034 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) MOVQ "".x+8(FP), CX
0x0027 00039 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) MOVQ "".x+16(FP), SI
0x002c 00044 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) MOVQ "".x+24(FP), BX
0x0031 00049 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) MOVQ $0, AX
0x0033 00051 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) CMPQ AX, SI
0x0036 00054 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) JGE $0, 103
0x0038 00056 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) NOP
0x0038 00056 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) MOVSD (CX), X2
0x003c 00060 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) CMPQ AX, DI
0x003f 00063 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) JCC $1, 104
0x0041 00065 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) LEAQ (R9)(AX*8), BX
0x0045 00069 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) MOVSD (BX), X0
0x0049 00073 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) MULSD X2, X0
0x004d 00077 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) ADDSD X3, X0
0x0051 00081 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) MOVAPD X0, X3
0x0055 00085 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) MOVSD X0, "".sum+56(FP)
0x005b 00091 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) NOP
0x005b 00091 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) ADDQ $8, CX
0x005f 00095 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) INCQ AX
0x0062 00098 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) CMPQ AX, SI
0x0065 00101 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) JLT $0, 56
0x0067 00103 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) NOP
0x0067 00103 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:13) RET
0x0068 00104 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) PCDATA $0, $0
0x0068 00104 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) CALL runtime.panicindex(SB)
0x006d 00109 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) UNDEF
0x006f 00111 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) NOP
0x006f 00111 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) CALL runtime.morestack_noctxt(SB)
0x0074 00116 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) JMP 0
0x0000 65 48 8b 0c 25 00 00 00 00 48 3b 61 10 76 60 4c eH..%....H;a.v`L
0x0010 8b 4c 24 20 48 8b 7c 24 28 0f 57 db f2 0f 11 5c .L$ H.|$(.W....\
0x0020 24 38 48 8b 4c 24 08 48 8b 74 24 10 48 8b 5c 24 $8H.L$.H.t$.H.\$
0x0030 18 31 c0 48 39 f0 7d 2f f2 0f 10 11 48 39 f8 73 .1.H9.}/....H9.s
0x0040 27 49 8d 1c c1 f2 0f 10 03 f2 0f 59 c2 f2 0f 58 'I.........Y...X
0x0050 c3 66 0f 28 d8 f2 0f 11 44 24 38 48 83 c1 08 48 .f.(....D$8H...H
0x0060 ff c0 48 39 f0 7c d1 c3 e8 00 00 00 00 0f 0b e8 ..H9.|..........
0x0070 00 00 00 00 eb 8a cc cc cc cc cc cc cc cc cc cc ................
rel 5+4 t=14 +0
rel 105+4 t=6 runtime.panicindex+0
rel 112+4 t=6 runtime.morestack_noctxt+0
"".DdotInc t=1 size=176 value=0 args=0x60 locals=0x0
0x0000 00000 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) TEXT "".DdotInc(SB), $0-96
0x0000 00000 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ (TLS), CX
0x0009 00009 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) CMPQ SP, 16(CX)
0x000d 00013 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) JLS 157
0x0013 00019 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) NOP
0x0013 00019 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) NOP
0x0013 00019 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".n+56(FP), R13
0x0018 00024 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".y+32(FP), R12
0x001d 00029 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".y+40(FP), R11
0x0022 00034 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".x+8(FP), R10
0x0027 00039 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".x+16(FP), R9
0x002c 00044 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".incX+64(FP), DI
0x0031 00049 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".incY+72(FP), SI
0x0036 00054 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".iy+88(FP), DX
0x003b 00059 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".ix+80(FP), CX
0x0040 00064 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) FUNCDATA $0, gclocals·7f14b12e2041f9b568f9bbe12353a4a8(SB)
0x0040 00064 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0040 00064 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) XORPS X1, X1
0x0043 00067 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVSD X1, "".sum+96(FP)
0x0049 00073 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) MOVQ $0, AX
0x004b 00075 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) CMPQ R13, AX
0x004e 00078 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) JLE $0, 142
0x0050 00080 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVAPD X1, X2
0x0054 00084 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) CMPQ DX, R11
0x0057 00087 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) JCC $1, 150
0x0059 00089 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) LEAQ (R12)(DX*8), BX
0x005d 00093 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVSD (BX), X0
0x0061 00097 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) CMPQ CX, R9
0x0064 00100 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) JCC $1, 143
0x0066 00102 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) LEAQ (R10)(CX*8), BX
0x006a 00106 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVSD (BX), X1
0x006e 00110 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MULSD X1, X0
0x0072 00114 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) ADDSD X2, X0
0x0076 00118 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVAPD X0, X1
0x007a 00122 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVSD X0, "".sum+96(FP)
0x0080 00128 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) NOP
0x0080 00128 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:19) ADDQ DI, CX
0x0083 00131 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:19) NOP
0x0083 00131 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:20) ADDQ SI, DX
0x0086 00134 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:20) NOP
0x0086 00134 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) INCQ AX
0x0089 00137 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) NOP
0x0089 00137 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) CMPQ R13, AX
0x008c 00140 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) JGT $0, 80
0x008e 00142 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:22) RET
0x008f 00143 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) PCDATA $0, $0
0x008f 00143 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) CALL runtime.panicindex(SB)
0x0094 00148 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) UNDEF
0x0096 00150 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) PCDATA $0, $0
0x0096 00150 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) CALL runtime.panicindex(SB)
0x009b 00155 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) UNDEF
0x009d 00157 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) NOP
0x009d 00157 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) CALL runtime.morestack_noctxt(SB)
0x00a2 00162 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) JMP 0
0x0000 65 48 8b 0c 25 00 00 00 00 48 3b 61 10 0f 86 8a eH..%....H;a....
0x0010 00 00 00 4c 8b 6c 24 38 4c 8b 64 24 20 4c 8b 5c ...L.l$8L.d$ L.\
0x0020 24 28 4c 8b 54 24 08 4c 8b 4c 24 10 48 8b 7c 24 $(L.T$.L.L$.H.|$
0x0030 40 48 8b 74 24 48 48 8b 54 24 58 48 8b 4c 24 50 @H.t$HH.T$XH.L$P
0x0040 0f 57 c9 f2 0f 11 4c 24 60 31 c0 49 39 c5 7e 3e .W....L$`1.I9.~>
0x0050 66 0f 28 d1 4c 39 da 73 3d 49 8d 1c d4 f2 0f 10 f.(.L9.s=I......
0x0060 03 4c 39 c9 73 29 49 8d 1c ca f2 0f 10 0b f2 0f .L9.s)I.........
0x0070 59 c1 f2 0f 58 c2 66 0f 28 c8 f2 0f 11 44 24 60 Y...X.f.(....D$`
0x0080 48 01 f9 48 01 f2 48 ff c0 49 39 c5 7f c2 c3 e8 H..H..H..I9.....
0x0090 00 00 00 00 0f 0b e8 00 00 00 00 0f 0b e8 00 00 ................
0x00a0 00 00 e9 59 ff ff ff cc cc cc cc cc cc cc cc cc ...Y............
rel 5+4 t=14 +0
rel 144+4 t=6 runtime.panicindex+0
rel 151+4 t=6 runtime.panicindex+0
rel 158+4 t=6 runtime.morestack_noctxt+0
gclocals·33cdeccccebe80329f1fdbee7f5874cb t=8 dupok size=8 value=0
0x0000 01 00 00 00 00 00 00 00 ........
gclocals·71f75e7e2fe2878e818867fe3428bd87 t=8 dupok size=12 value=0
0x0000 01 00 00 00 07 00 00 00 09 00 00 00 ............
gclocals·33cdeccccebe80329f1fdbee7f5874cb t=8 dupok size=8 value=0
0x0000 01 00 00 00 00 00 00 00 ........
gclocals·7f14b12e2041f9b568f9bbe12353a4a8 t=8 dupok size=12 value=0
0x0000 01 00 00 00 0c 00 00 00 09 00 00 00 ............
"".DdotUnitary·f t=8 dupok size=8 value=0
0x0000 00 00 00 00 00 00 00 00 ........
rel 0+8 t=1 "".DdotUnitary+0
"".DdotInc·f t=8 dupok size=8 value=0
0x0000 00 00 00 00 00 00 00 00 ........
rel 0+8 t=1 "".DdotInc+0
runtime.gcbits.01 t=8 dupok size=1 value=0
0x0000 01 .
go.string.hdr."[]float64" t=8 dupok size=16 value=0
0x0000 00 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 ................
rel 0+8 t=1 go.string."[]float64"+0
go.string."[]float64" t=8 dupok size=16 value=0
0x0000 5b 5d 66 6c 6f 61 74 36 34 00 []float64.
type.[]float64 t=8 dupok size=72 value=0
0x0000 18 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 ................
0x0010 30 33 37 9c 00 08 08 17 00 00 00 00 00 00 00 00 037.............
0x0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0040 00 00 00 00 00 00 00 00 ........
rel 24+8 t=1 runtime.algarray+272
rel 32+8 t=1 runtime.gcbits.01+0
rel 40+8 t=1 go.string.hdr."[]float64"+0
rel 56+8 t=1 go.weak.type.*[]float64+0
rel 64+8 t=1 type.float64+0
go.typelink.[]float64 []float64 t=8 dupok size=8 value=0
0x0000 00 00 00 00 00 00 00 00 ........
rel 0+8 t=1 type.[]float64+0
brendan:~/Documents/mygo/src/github.com/gonum/internal/asm$ go version
go version devel +fb54e03 Thu Feb 25 07:10:07 2016 +0000 darwin/amd64
brendan:~/Documents/mygo/src/github.com/gonum/internal/asm$ go build -gcflags=-S -tags noasm ddot.go
# command-line-arguments
"".DdotUnitary t=1 size=128 value=0 args=0x38 locals=0x0
0x0000 00000 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) TEXT "".DdotUnitary(SB), $0-56
0x0000 00000 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) MOVQ (TLS), CX
0x0009 00009 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) CMPQ SP, 16(CX)
0x000d 00013 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) JLS 113
0x000f 00015 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) NOP
0x000f 00015 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) NOP
0x000f 00015 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) FUNCDATA $0, gclocals·71f75e7e2fe2878e818867fe3428bd87(SB)
0x000f 00015 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x000f 00015 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) XORPS X0, X0
0x0012 00018 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) MOVSD X0, "".sum+56(FP)
0x0018 00024 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) MOVQ "".x+8(FP), AX
0x001d 00029 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) MOVQ $0, CX
0x001f 00031 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) MOVQ "".x+16(FP), DX
0x0024 00036 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) CMPQ CX, DX
0x0027 00039 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) JGE $0, 105
0x0029 00041 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) TESTB AL, (AX)
0x002b 00043 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) MOVSD (AX), X0
0x002f 00047 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) MOVSD "".sum+56(FP), X1
0x0035 00053 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) MOVQ "".y+40(FP), BX
0x003a 00058 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) CMPQ CX, BX
0x003d 00061 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) JCC $0, 106
0x003f 00063 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) MOVQ "".y+32(FP), BP
0x0044 00068 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) MOVSD (BP)(CX*8), X2
0x004a 00074 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) MULSD X2, X0
0x004e 00078 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) ADDSD X1, X0
0x0052 00082 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) MOVSD X0, "".sum+56(FP)
0x0058 00088 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) ADDQ $8, AX
0x005c 00092 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) INCQ CX
0x005f 00095 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) MOVQ "".x+16(FP), DX
0x0064 00100 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) CMPQ CX, DX
0x0067 00103 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:10) JLT $0, 41
0x0069 00105 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:13) RET
0x006a 00106 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) PCDATA $0, $0
0x006a 00106 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) CALL runtime.panicindex(SB)
0x006f 00111 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) UNDEF
0x0071 00113 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:11) NOP
0x0071 00113 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) CALL runtime.morestack_noctxt(SB)
0x0076 00118 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:9) JMP 0
0x0000 65 48 8b 0c 25 00 00 00 00 48 3b 61 10 76 62 0f eH..%....H;a.vb.
0x0010 57 c0 f2 0f 11 44 24 38 48 8b 44 24 08 31 c9 48 W....D$8H.D$.1.H
0x0020 8b 54 24 10 48 39 d1 7d 40 84 00 f2 0f 10 00 f2 .T$.H9.}@.......
0x0030 0f 10 4c 24 38 48 8b 5c 24 28 48 39 d9 73 2b 48 ..L$8H.\$(H9.s+H
0x0040 8b 6c 24 20 f2 0f 10 54 cd 00 f2 0f 59 c2 f2 0f .l$ ...T....Y...
0x0050 58 c1 f2 0f 11 44 24 38 48 83 c0 08 48 ff c1 48 X....D$8H...H..H
0x0060 8b 54 24 10 48 39 d1 7c c0 c3 e8 00 00 00 00 0f .T$.H9.|........
0x0070 0b e8 00 00 00 00 eb 88 cc cc cc cc cc cc cc cc ................
rel 5+4 t=14 +0
rel 107+4 t=6 runtime.panicindex+0
rel 114+4 t=6 runtime.morestack_noctxt+0
"".DdotInc t=1 size=160 value=0 args=0x60 locals=0x0
0x0000 00000 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) TEXT "".DdotInc(SB), $0-96
0x0000 00000 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ (TLS), CX
0x0009 00009 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) CMPQ SP, 16(CX)
0x000d 00013 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) JLS 148
0x0013 00019 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) NOP
0x0013 00019 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) NOP
0x0013 00019 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) FUNCDATA $0, gclocals·7f14b12e2041f9b568f9bbe12353a4a8(SB)
0x0013 00019 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0013 00019 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) XORPS X0, X0
0x0016 00022 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVSD X0, "".sum+96(FP)
0x001c 00028 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".ix+80(FP), AX
0x0021 00033 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVQ "".iy+88(FP), CX
0x0026 00038 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) MOVQ $0, DX
0x0028 00040 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) MOVQ "".n+56(FP), BX
0x002d 00045 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) CMPQ DX, BX
0x0030 00048 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) JGE $0, 140
0x0032 00050 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) MOVSD "".sum+96(FP), X0
0x0038 00056 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVQ "".y+40(FP), BP
0x003d 00061 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) CMPQ CX, BP
0x0040 00064 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) JCC $0, 141
0x0042 00066 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVQ "".y+32(FP), SI
0x0047 00071 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVSD (SI)(CX*8), X1
0x004c 00076 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVQ "".x+16(FP), DI
0x0051 00081 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) CMPQ AX, DI
0x0054 00084 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) JCC $0, 141
0x0056 00086 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVQ "".x+8(FP), R8
0x005b 00091 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVSD (R8)(AX*8), X2
0x0061 00097 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MULSD X2, X1
0x0065 00101 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) ADDSD X1, X0
0x0069 00105 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) MOVSD X0, "".sum+96(FP)
0x006f 00111 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) INCQ DX
0x0072 00114 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:19) MOVQ "".incX+64(FP), R9
0x0077 00119 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:19) ADDQ R9, AX
0x007a 00122 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:20) MOVQ "".incY+72(FP), R10
0x007f 00127 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:20) ADDQ R10, CX
0x0082 00130 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) MOVQ "".n+56(FP), BX
0x0087 00135 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) CMPQ DX, BX
0x008a 00138 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:17) JLT $0, 50
0x008c 00140 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:22) RET
0x008d 00141 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) PCDATA $0, $0
0x008d 00141 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) CALL runtime.panicindex(SB)
0x0092 00146 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) UNDEF
0x0094 00148 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:18) NOP
0x0094 00148 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) CALL runtime.morestack_noctxt(SB)
0x0099 00153 (/Users/brendan/Documents/mygo/src/github.com/gonum/internal/asm/ddot.go:16) JMP 0
0x0000 65 48 8b 0c 25 00 00 00 00 48 3b 61 10 0f 86 81 eH..%....H;a....
0x0010 00 00 00 0f 57 c0 f2 0f 11 44 24 60 48 8b 44 24 ....W....D$`H.D$
0x0020 50 48 8b 4c 24 58 31 d2 48 8b 5c 24 38 48 39 da PH.L$X1.H.\$8H9.
0x0030 7d 5a f2 0f 10 44 24 60 48 8b 6c 24 28 48 39 e9 }Z...D$`H.l$(H9.
0x0040 73 4b 48 8b 74 24 20 f2 0f 10 0c ce 48 8b 7c 24 sKH.t$ .....H.|$
0x0050 10 48 39 f8 73 37 4c 8b 44 24 08 f2 41 0f 10 14 .H9.s7L.D$..A...
0x0060 c0 f2 0f 59 ca f2 0f 58 c1 f2 0f 11 44 24 60 48 ...Y...X....D$`H
0x0070 ff c2 4c 8b 4c 24 40 4c 01 c8 4c 8b 54 24 48 4c ..L.L$@L..L.T$HL
0x0080 01 d1 48 8b 5c 24 38 48 39 da 7c a6 c3 e8 00 00 ..H.\$8H9.|.....
0x0090 00 00 0f 0b e8 00 00 00 00 e9 62 ff ff ff cc cc ..........b.....
rel 5+4 t=14 +0
rel 142+4 t=6 runtime.panicindex+0
rel 149+4 t=6 runtime.morestack_noctxt+0
gclocals·33cdeccccebe80329f1fdbee7f5874cb t=8 dupok size=8 value=0
0x0000 01 00 00 00 00 00 00 00 ........
gclocals·71f75e7e2fe2878e818867fe3428bd87 t=8 dupok size=12 value=0
0x0000 01 00 00 00 07 00 00 00 09 00 00 00 ............
gclocals·33cdeccccebe80329f1fdbee7f5874cb t=8 dupok size=8 value=0
0x0000 01 00 00 00 00 00 00 00 ........
gclocals·7f14b12e2041f9b568f9bbe12353a4a8 t=8 dupok size=12 value=0
0x0000 01 00 00 00 0c 00 00 00 09 00 00 00 ............
"".DdotUnitary·f t=8 dupok size=8 value=0
0x0000 00 00 00 00 00 00 00 00 ........
rel 0+8 t=1 "".DdotUnitary+0
"".DdotInc·f t=8 dupok size=8 value=0
0x0000 00 00 00 00 00 00 00 00 ........
rel 0+8 t=1 "".DdotInc+0
runtime.gcbits.01 t=8 dupok size=1 value=0
0x0000 01 .
go.string.hdr."[]float64" t=8 dupok size=16 value=0
0x0000 00 00 00 00 00 00 00 00 09 00 00 00 00 00 00 00 ................
rel 0+8 t=1 go.string."[]float64"+0
go.string."[]float64" t=8 dupok size=10 value=0
0x0000 5b 5d 66 6c 6f 61 74 36 34 00 []float64.
type.[]float64 t=8 dupok size=72 value=0
0x0000 18 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 ................
0x0010 30 33 37 9c 00 08 08 17 00 00 00 00 00 00 00 00 037.............
0x0020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0040 00 00 00 00 00 00 00 00 ........
rel 24+8 t=1 runtime.algarray+0
rel 32+8 t=1 runtime.gcbits.01+0
rel 40+8 t=1 go.string.hdr."[]float64"+0
rel 56+8 t=1 go.weak.type.*[]float64+0
rel 64+8 t=1 type.float64+0
go.typelink.[]float64 []float64 t=8 dupok size=8 value=0
0x0000 00 00 00 00 00 00 00 00 ........
rel 0+8 t=1 type.[]float64+0
Looks like mostly a problem of not registerizing a few variables before the loop starts. I'll have to think about how to handle this one. Also a nil check not eliminated, I've got a simple fix for that.
I'm surprised the times are that much slower. It is only a few extra loads from the stack frame per iteration.
CL https://golang.org/cl/19923 mentions this issue.
Looks like almost all of lost performance is caused by not SSA-ifying PARAMOUT (approximately, named return) values. You can test by changing
func DdotUnitary(x, y []float64) (sum float64) {
for i, v := range x {
sum += y[i] * v
}
return
}
to
func DdotUnitary(x, y []float64) float64 {
sum := 0.0
for i, v := range x {
sum += y[i] * v
}
return sum
}
I already have a CL partially coded up to fix this. I'll increase its priority.
For reproducing posterity:
go get github.com/gonum/blas
go get github.com/gonum/floats
cd $GOPATH/src/github.com/gonum/blas/native
go test -test.bench=Ddot -tags noasm
CL https://golang.org/cl/19988 mentions this issue.
Definitely much better than it was. Still minor regressions
brendan:~$ benchstat goonesix.txt ssatip.txt
name old time/op new time/op delta
DdotSmallBothUnitary 18.7ns ± 1% 23.5ns ± 1% +25.67% (p=0.016 n=4+5)
DdotSmallIncUni 23.4ns ± 1% 29.4ns ± 1% +25.58% (p=0.008 n=5+5)
DdotSmallUniInc 22.7ns ± 1% 28.1ns ± 1% +23.79% (p=0.008 n=5+5)
DdotSmallBothInc 22.5ns ± 0% 27.9ns ± 2% +24.09% (p=0.016 n=4+5)
DdotMediumBothUnitary 924ns ± 1% 933ns ± 1% +1.00% (p=0.024 n=5+5)
DdotMediumIncUni 1.28µs ± 2% 1.77µs ± 1% +37.85% (p=0.008 n=5+5)
DdotMediumUniInc 1.23µs ± 0% 1.53µs ± 0% +24.46% (p=0.008 n=5+5)
DdotMediumBothInc 1.33µs ± 1% 1.83µs ± 1% +37.53% (p=0.008 n=5+5)
DdotLargeBothUnitary 93.7µs ± 1% 92.6µs ± 1% ~ (p=0.095 n=5+5)
DdotLargeIncUni 200µs ± 1% 244µs ± 1% +22.01% (p=0.008 n=5+5)
DdotLargeUniInc 135µs ± 1% 174µs ± 1% +28.88% (p=0.008 n=5+5)
DdotLargeBothInc 275µs ± 1% 302µs ± 1% +9.75% (p=0.008 n=5+5)
DdotHugeBothUnitary 11.7ms ± 1% 11.4ms ± 0% -2.59% (p=0.008 n=5+5)
DdotHugeIncUni 28.6ms ± 1% 31.4ms ± 1% +9.78% (p=0.008 n=5+5)
DdotHugeUniInc 19.8ms ± 1% 23.0ms ± 1% +16.23% (p=0.008 n=5+5)
DdotHugeBothInc 37.7ms ± 1% 37.6ms ± 3% ~ (p=0.421 n=5+5)
CL https://golang.org/cl/20151 mentions this issue.
Comparison with go version devel +c63dbd8 Thu Mar 10 18:35:10 2016 +0000 darwin/amd64
brendan:~/Documents/mygo$ benchstat blasonesixddot.txt ssatipddot.txt
name old time/op new time/op delta
DdotSmallBothUnitary-8 17.6ns ± 1% 15.6ns ± 2% -11.28% (p=0.008 n=5+5)
DdotSmallIncUni-8 21.9ns ± 1% 21.9ns ± 1% ~ (p=0.952 n=5+5)
DdotSmallUniInc-8 21.2ns ± 1% 20.2ns ± 0% -4.54% (p=0.000 n=5+4)
DdotSmallBothInc-8 21.1ns ± 0% 20.8ns ± 1% -1.42% (p=0.016 n=5+5)
DdotMediumBothUnitary-8 851ns ± 1% 843ns ± 1% -1.01% (p=0.032 n=5+5)
DdotMediumIncUni-8 1.17µs ± 1% 0.95µs ± 0% -18.32% (p=0.008 n=5+5)
DdotMediumUniInc-8 1.12µs ± 0% 0.86µs ± 1% -22.91% (p=0.008 n=5+5)
DdotMediumBothInc-8 1.21µs ± 1% 0.99µs ± 2% -18.72% (p=0.008 n=5+5)
DdotLargeBothUnitary-8 85.9µs ± 1% 83.0µs ± 1% -3.33% (p=0.008 n=5+5)
DdotLargeIncUni-8 169µs ± 1% 154µs ± 1% -8.97% (p=0.008 n=5+5)
DdotLargeUniInc-8 121µs ± 1% 106µs ± 1% -11.99% (p=0.008 n=5+5)
DdotLargeBothInc-8 241µs ± 1% 230µs ± 1% -4.26% (p=0.008 n=5+5)
DdotHugeBothUnitary-8 10.6ms ± 1% 10.1ms ± 1% -4.59% (p=0.008 n=5+5)
DdotHugeIncUni-8 25.8ms ± 1% 25.6ms ± 3% ~ (p=0.151 n=5+5)
DdotHugeUniInc-8 17.7ms ± 1% 16.7ms ± 1% -5.81% (p=0.016 n=5+4)
DdotHugeBothInc-8 33.0ms ± 0% 33.2ms ± 1% ~ (p=0.151 n=5+5)
We are seeing some significant performance regressions for BLAS benchmarks vs. 1.6. These benchmarks are numeric, and consist almost entirely of []float64 indexing and assignment. While they may seem hyper-specialized, calls to Dgemm in particular can make up a significant fraction of runtime in codes we write.
Note that Dgemm is coded as a concurrent algorithm, while Dgemv is not.
Code: https://godoc.org/github.com/gonum/blas/native Dgemv: https://github.com/gonum/blas/blob/master/native/level2double.go#L13 Dgemm: https://github.com/gonum/blas/blob/master/native/dgemm.go Actual benchmark call is in the packages, but code is in the blas/testblas package.
Call:
The noasm flag changes a dot product inner loop call to use the native go version (https://github.com/gonum/internal/blob/master/asm/ddot.go) instead of the assembly version.
SSA Version: go version devel +fb54e03 Thu Feb 25 07:10:07 2016 +0000 darwin/amd64
Go env output: