Open srosenberg opened 1 year ago
cc @cockroachdb/test-eng
Graviton2 (not yet graviton3) exposes PMU counters for unaligned load/store, namely unaligned_ld_spec
and unaligned_st_spec
. We run a TPCC-C workload to get a count of unaligned accesses relative to all memory accesses, namely mem_access_rd
and mem_access_wr
.
roachprod create -n1 --clouds aws --aws-machine-type c6g.12xlarge --local-ssd=false stan-test
roachprod stage stan-test release v22.2.13
roachprod stage stan-test workload --arch arm64 --os linux
perf stat -e mem_access_rd,mem_access_wr,unaligned_ld_spec,unaligned_st_spec ./cockroach start-single-node --insecure
Performance counter stats for './cockroach start-single-node --insecure':
1314797387351 mem_access_rd (49.95%)
829265356569 mem_access_wr (49.99%)
23037710432 unaligned_ld_spec (50.06%)
16752809887 unaligned_st_spec (50.01%)
779.429239436 seconds time elapsed
perf stat -e mem_access_rd,mem_access_wr,unaligned_ld_spec,unaligned_st_spec ./workload_before fixtures import tpcc --warehouses=100 'postgres://root@localhost:26257?sslmode=disable'
Performance counter stats for './workload_before fixtures import tpcc --warehouses=100 postgres://root@localhost:26257?sslmode=disable':
219406219 mem_access_rd (56.73%)
138251994 mem_access_wr (50.49%)
1305497 unaligned_ld_spec (50.73%)
809740 unaligned_st_spec (51.35%)
107.547993203 seconds time elapsed
perf stat -e mem_access_rd,mem_access_wr,unaligned_ld_spec,unaligned_st_spec ./workload run tpcc --warehouses=100 --ramp=1m --duration=5m
Performance counter stats for './workload run tpcc --warehouses=100 --ramp=1m --duration=5m':
5535386284 mem_access_rd (49.72%)
3023814042 mem_access_wr (50.35%)
20462041 unaligned_ld_spec (50.41%)
15644317 unaligned_st_spec (49.76%)
360.307078331 seconds time elapsed
For the database, we see ~1.75% of loads and ~2% of stores are unaligned. In the case of the workload, the numbers fall under 1%. (Intuitively, workload doesn't move as much data in memory as does the database.)
We also repeated the above workload steps with the workload
binary prior to the change in [1]. There was no (statistical) difference in unaligned_ld_spec
or unaligned_st_spec
.
To illustrate what happens when an atomic store faults on graviton2, consider the following example using unsafe
. The 10th element is unaligned. Since we know that Go's compiler guarantees the first element to be 64-bit aligned, an element, i, is 64-bit aligned iff i = 0 mod 8
. Executing the code below on graviton2 will fault with SIGBUS
.
package main
import (
"fmt"
"sync/atomic"
"math/rand"
"unsafe"
)
func main() {
buf := [100000]byte{}
for i := 1; i < len(buf); i++ {
// N.B. will cause SIGBUS owing to unaligned atomic store
atomic.AddInt64((*int64)(unsafe.Pointer(&buf[9])), int64(rand.Intn(100)))
// N.B. uncomment to prevent SIGBUG
// atomic.AddInt64((*int64)(unsafe.Pointer(&buf[8])), int64(rand.Intn(100)))
}
fmt.Println("after:", buf[9])
}
Executing the above via gdb
, we can see the faulted instruction below,
Unaligned load/store on arm64 can lead to lower memory bandwidth and higher latency; e.g., see Go's memmove benchmarks before and after [1]. Correctness is likely not an issue unless performing 64-bit atomic operations on values which Go's compiler doesn't guarantee to be 64-bit aligned [2], [3]. The reason is rather subtle; it deserves a more detailed explanation.
Correctness Explanation
Atomic load/store on arm64 must be aligned (otherwise,
SIGBUS
is raised; see the writeup below). E.g., according to [4], the Store-Release instruction,ldaddal
will fault on unaligned access,Non-atomic load/store can be unaligned. Go's compiler guarantees basic alignment on struct fields and array elements [5]. E.g., if a struct has a field of type
int64
, then all the fields in the struct are 64-bit aligned (via padding) on arm64. However, they are not 64-bit aligned on arm32, hence the "bug" note in [3],In summary, 64-bit atomics on arm32 are without faults iff they follow the above guidelines, e.g., only reference the first field of a struct. Otherwise, you risk getting shot down by an unaligned load/store. On arm64, a fault could only happen due to uses of
unsafe
, since type-checked accesses are 64-bit aligned.[1] https://github.com/golang/go/issues/40324 [2] https://github.com/golang/go/issues/23345 [3] https://pkg.go.dev/sync/atomic#pkg-note-BUG [4] https://developer.arm.com/documentation/102336/0100/Load-Acquire-and-Store-Release-instructions [5] https://go.dev/ref/spec#Size_and_alignment_guarantees
Jira issue: CRDB-30784