Open langston-barrett opened 3 months ago
Does Crucible's memory model have the ability to segment a chunk of memory? (As in, distinguish regions that are adjacent to each other and addressable from the same base pointer?) Then you could prepare a stack frame that will be resistant to such attacks... given some assumptions about when you should specialize a pointer to a segmented chunk to a pointer to a segment, which may turn out to be not so easy.
(just speculating, feel free to ignore)
Does Crucible's memory model have the ability to segment a chunk of memory? (As in, distinguish regions that are adjacent to each other and addressable from the same base pointer?)
I don't believe so.
Another idea, not sure how feasible this one is... We could create an ExecutionFeature
that, before entering a new Macaw CFG:
StackAlloc
That would at least prevent inter-frame memory corruption, though it wouldn't do anything to stop code from overwriting adjacent stack variables nor the return address. Perhaps for the return address, the same execution feature could check that it is unmodified at function returns.
An addendum: I believe that RISC-V also initializes the end of its stack with slots for stack-spilled arguments, just as AArch32 does (although you should make sure that you adjust the offset on the stack pointer s0
depending on whether you are using 32-bit or 64-bit RISC-V).
Prior art:
Data.Macaw.Symbolic.Testing
Uses a symbolic, SMT-array-backed stack. As an aside, it's kind of odd to me that this stack setup is spread across so many different lines.
pate
Also seems array-backed
saw-script
Seems like it's just a fresh, symbolic pointer?
Potentially with negative consequences?
ambient-verifier
Also does an SMT array:
In a downstream project, we're using
macaw-symbolic
to symbolically execute binaries.macaw-symbolic
encodes functions in the binary as Crucible CFGs that take and return a struct that represents the values of all the registers. To execute such CFGs, we must set initial values for all registers. Most registers can be set to zeroes or symbolic values and plenty of code will run just fine. However, machine code has a lot of expectations about one particular register: the stack pointer.Each ABI[^1] specifies a stack layout, and our project attempts to model this layout at a low level of abstraction. In particular, we create a large-ish (1MiB) Crucible-LLVM allocation, and:
On AArch32, the very end is initialized with a number of pointer-sized stack slocks, which are reserved for stack-spilled arguments.[^spill]
On PPC, the very end is initialized with (in order):
On x86_64, the very end is initialized with (in order):
The contents of the stack pointer register are then initialized to be a pointer to this allocation, offset to where the above data begin.
I'd wager that since a lot of code uses the stack, most clients of
macaw-symbolic
want to model the stack somehow. Furthermore, sincemacaw-symbolic
fairly directly translates manipulations of the stack pointer register in binaries into the Crucible CFGs, I'd argue that this is likely the only strategy that works "out of the box". Therefore, I think we should upstream this code (or something like it) intomacaw-*-symbolic
.One notable upside of this kind of model of the stack is that we can examine what happens in the case of certain kinds of common stack-based exploits, such as stack buffer overruns that clobber adjacent variables, as in this C program:
One notable downside is that such exploits are not automatically detected by the Crucible-LLVM memory model, and so care would be required by users of such an API if they intend to perform safety-oriented analysis.
[^1]: at a first approximation, each (OS, architecture) pair [^spill]: One difficulty is that it's hard to tell in advance of executing a function how many stack slots should be reserved for spilled arguments. DWARF information can serve as a source of truth here, when it's available.