Closed RaduBerinde closed 9 months ago
Added two more runs to the data:
StringMap/avgLoad,n=10/swissMap/Get-10 8.35ns ± 1% 8.46ns ± 1% +1.34% (p=0.000 n=15+15)
StringMap/avgLoad,n=83/swissMap/Get-10 8.77ns ± 9% 8.99ns ±11% ~ (p=0.140 n=15+16)
StringMap/avgLoad,n=671/swissMap/Get-10 8.91ns ± 5% 9.06ns ± 3% +1.66% (p=0.024 n=16+15)
StringMap/avgLoad,n=5375/swissMap/Get-10 9.25ns ± 2% 9.41ns ± 1% +1.70% (p=0.000 n=16+16)
StringMap/avgLoad,n=86015/swissMap/Get-10 11.1ns ± 3% 11.1ns ± 3% ~ (p=0.323 n=15+16)
I think making the ctrlGroup
methods have pointer receivers (and using GroupAt
) fixed the difference. I assume we were doing an extra unnecessary copy to a register first.
StringMap/avgLoad,n=10/swissMap/Get-4 8.38ns ± 1% 8.37ns ± 1% ~ (p=0.723 n=18+20)
StringMap/avgLoad,n=83/swissMap/Get-4 8.87ns ±11% 8.81ns ± 9% ~ (p=0.829 n=20+24)
StringMap/avgLoad,n=671/swissMap/Get-4 8.96ns ± 3% 8.93ns ± 3% ~ (p=0.368 n=20+24)
StringMap/avgLoad,n=5375/swissMap/Get-4 9.33ns ± 1% 9.31ns ± 2% ~ (p=0.465 n=20+24)
StringMap/avgLoad,n=86015/swissMap/Get-4 11.2ns ± 3% 11.1ns ± 3% ~ (p=0.823 n=18+20)
Int64Map/avgLoad,n=10/swissMap/Get-4 4.86ns ± 1% 4.86ns ± 1% ~ (p=0.872 n=17+17)
Int64Map/avgLoad,n=83/swissMap/Get-4 5.32ns ±12% 5.36ns ±13% ~ (p=0.542 n=20+20)
Int64Map/avgLoad,n=671/swissMap/Get-4 5.42ns ± 7% 5.41ns ± 7% ~ (p=0.815 n=20+20)
Int64Map/avgLoad,n=5375/swissMap/Get-4 5.85ns ± 1% 5.82ns ± 3% ~ (p=0.189 n=20+20)
Int64Map/avgLoad,n=86015/swissMap/Get-4 7.02ns ± 1% 7.01ns ± 1% ~ (p=0.230 n=19+18)
Hmm, I would not have guessed that a pointer vs value receiver would have made a difference given both are 64-bit. Feels like a compiler deficiency if it is reproducible. The change LGTM.
Yeah no longer sure that was it. The assembly looks almost the same between the two. It might have been measurement error (the last numbers were obtained by switching back and forth 10 times (with -count 5 -benchtime 100000000x
each time).
Assembly for match := g.matchH2(h2(h))
in Get. Pointer receiver:
0x10016bb70 f9400045 MOVD (R2), R5
return h & 0x7f
0x10016bb74 92401806 AND $127, R0, R6
v := uint64(*g) ^ (bitsetLSB * uint64(h))
0x10016bb78 f86468a5 MOVD (R5)(R4), R5
0x10016bb7c b200c3e7 MOVD $72340172838076673, R7
0x10016bb80 9b067ce6 MUL R6, R7, R6
0x10016bb84 ca0600a8 EOR R6, R5, R8
return bitset(((v - bitsetLSB) &^ v) & bitsetMSB)
0x10016bb88 cb070108 SUB R7, R8, R8
0x10016bb8c ca2600a6 EON R6, R5, R6
0x10016bb90 8a060106 AND R6, R8, R6
match := g.matchH2(h2(h))
0x10016bb94 d503201f NOOP
0x10016bb98 d503201f NOOP
return bitset(((v - bitsetLSB) &^ v) & bitsetMSB)
0x10016bb9c 9201c0c6 AND $-9187201950435737472, R6, R6
Value receiver:
0x10016bb50 f9400045 MOVD (R2), R5
match := g.matchH2(h2(h))
0x10016bb54 f86468a5 MOVD (R5)(R4), R5
return h & 0x7f
0x10016bb58 92401806 AND $127, R0, R6
v := uint64(g) ^ (bitsetLSB * uint64(h))
0x10016bb5c b200c3e7 MOVD $72340172838076673, R7
0x10016bb60 9b067ce6 MUL R6, R7, R6
0x10016bb64 ca0600a8 EOR R6, R5, R8
return bitset(((v - bitsetLSB) &^ v) & bitsetMSB)
0x10016bb68 cb070108 SUB R7, R8, R8
0x10016bb6c ca2600a6 EON R6, R5, R6
0x10016bb70 8a060106 AND R6, R8, R6
0x10016bb74 9201c0c6 AND $-9187201950435737472, R6, R6
The ordering of the first 3 instructions differ. With the value receiver we're doing 2 adjacent loads where the second depends on the first. Mildly hard to believe that is the difference. 🤷🏼 The pointer receiver seems just as understandable. I'd just go with that unless you have a desire to understand this further.
I'd just go with that unless you have a desire to understand this further.
Agreed, and thanks for stopping me from nerd-sniping myself :)
This is what I had in mind.
add ctrlGroup and ctrlBytes types
This commit adds
ctrlGroup
which represents an 8-byte group andctrlBytes
which adds more functionality on top of the unsafeSlice.We now retrieve a
ctrlGroup
fromctrlBytes
and the matching functions are methods onctrlGroup
. Previously, these methods were casting from the*ctrl
receiver.