High performance persistent shared scratch pads

By popular demand, this is a more general implementation of the non-alloca parts of util/scratch (and it probably makes sense at some point to convert the util/scratch equivalents to use this under the hood).

From the documentation, a spad is a scratch pad that behaves very much like a thread's stack:

Spad allocations are very fast O(1) assembly.
Spad allocations are grouped into a frames.
Frames are nested.
Pushing and popping frames are also very fast O(1) assembly.
All allocations in a frame are automatically freed when the frame is popped.

Unlike a thread's stack, the most recent allocation can be trimmed, the most recent sequence of allocations be undone, operations on a spad can by done more than one thread, threads can operate on multiple spads and, if the spad is backed by a shared memory region (e.g. wksp), spad allocations can be shared with different processes. Also, it flexibly supports tight integration with real-time streaming, custom allocation alignments, programmatic usage queries, validation, and a large dynamic range of allocation sizes and alignments. Further, the API can be changed at compile time to implementations with extra instrumentation for debugging and/or sanitization.

firedancer-io / firedancer

High performance persistent shared scratch pads #2261