Each shard begins with 40 bytes of padding, bringing the size of Shard to 64 bytes, a common cache line size for many CPUs. However, this is insufficient to prevent CPU false sharing. The base address must also be a multiple of a cache line. The class CoreLocalArray<> does not explicitly align the block of contiguous T objects to an address that is a multiple of a cache line.
By applying alignas(64) to the Shard structure, we achieve two things. First, the compiler adds padding at the end, maintaining the object size at 64 bytes. Second, the new operator honors alignas, performing a 64-byte aligned memory allocation for T[].
Similarly, ConcurrentArena currently uses padding fields before and after the essential fields. While this technique helps prevent false sharing with adjacent fields of a ConcurrentArena object, it wastes memory and reduces code readability. Using alignas(64) on ConcurrentArena would ensure cache line alignment of the object (reducing its size from 2472 to 2364 bytes). Additionally, it guarantees cache line alignment for any object containing a ConcurrentArena field, including heap-allocated objects.
Manual verification using static_asserts (not included in this PR). No functional changes.
Each shard begins with 40 bytes of padding, bringing the size of
Shard
to 64 bytes, a common cache line size for many CPUs. However, this is insufficient to prevent CPU false sharing. The base address must also be a multiple of a cache line. The classCoreLocalArray<>
does not explicitly align the block of contiguousT
objects to an address that is a multiple of a cache line.By applying
alignas(64)
to theShard
structure, we achieve two things. First, the compiler adds padding at the end, maintaining the object size at 64 bytes. Second, the new operator honorsalignas
, performing a 64-byte aligned memory allocation forT[]
.Similarly,
ConcurrentArena
currently uses padding fields before and after the essential fields. While this technique helps prevent false sharing with adjacent fields of aConcurrentArena
object, it wastes memory and reduces code readability. Usingalignas(64)
on ConcurrentArena would ensure cache line alignment of the object (reducing its size from 2472 to 2364 bytes). Additionally, it guarantees cache line alignment for any object containing aConcurrentArena
field, including heap-allocated objects.Manual verification using static_asserts (not included in this PR). No functional changes.