Open RyanGlScott opened 1 year ago
I agree on changing the equality perspective for overrides.
The dependency on pointer types is deeply, deeply ingrained into UC-Crux-LLVM. UC-Crux attempts to uncover function preconditions by repeatedly executing a function starting from "minimal" symbolic inputs and attempting to "fix" any undefined behavior it encounters (see this post). In particular, when allocating a pointer in the input, it uses the type of that pointer to determine the size of the allocation, and later, to deduce reasonable preconditions that might be applicable to the backing memory (e.g., if it's a pointer to an array of integers, some of those integers might have some maximum bound). Excising the use of pointer types would be a substantial undertaking.
I certainly believe you when you say that this would be a substantial undertaking. That being said, I am curious to know how uc-crux-llvm
's current approach to tracking pointee types works, if only to better my understanding of how much work would be required:
In particular, when allocating a pointer in the input, it uses the type of that pointer to determine the size of the allocation, and later, to deduce reasonable preconditions that might be applicable to the backing memory (e.g., if it's a pointer to an array of integers, some of those integers might have some maximum bound).
At first glance, this approach seems to be compatible with opaque pointers. Even if pointer types are themselves opaque, their allocation sites are always annotated with their type. For example, an alloca <ty>
instruction will return something of type ptr
, but the <ty>
should tell you everything you need to know about size, preconditions, etc.
The part where this gets tricky is when LLVM reinterprets the underlying memory at a different type, be it through bitcast
s, getelementptr
, or some other mechanism. For example, see here for a tricky example where a pointer is allocated with a "pointee type" of [2 x [2 x i8]]
, but a subsequent store
instruction reinterprets the underlying memory at type i32
. How does uc-crux-llvm
deal with this sort of situation?
Even if pointer types are themselves opaque, their allocation sites are always annotated with their type.
Except for heap allocations :slightly_smiling_face: But anyway, the problem is that UC-Crux attempts to execute individual functions from the program, and must allocate memory itself to do so. Consider this function:
int f(int *x) {
return x[5] + 1;
}
UC-Crux would first execute this with an "intial pointer" (see the blog post). It would fail when reading from x
, because there wouldn't be any allocation backing it. On the next attempt, UC-Crux would use the function's signature to deduce that x
is a pointer to an array of integers, and allocate an array of integers with length 6. Executing the function this time results in a signed integer overflow, so UC-Crux would deduce that the integer at index 5 must be signed-less-than INT_MAX
. After this modification, execution would succeed.
These preconditions are stored in a Shape
, as you highlighted above. The Shape
is tied to (and type-indexed by) the LLVM type of the function argument, and the logic for deducing the preconditions depends on the type/shape involved. Shape
s are indexed by Cursor
s, which also use high-level notions like struct and array types, rather than byte offsets.
To handle opaque pointers, this logic would all need to be based on offsets. Additionally, UC-Crux would either have to make shot-in-the-dark guesses about allocation sizes, or would have to instrument load
/store
/getelementptr
instructions to track the types used and relate them back to the corresponding part of the input being loaded from.
How does uc-crux-llvm deal with this sort of situation?
It gives up a lot! It can simply fail to find adequate preconditions.
Thanks, this is helpful to know.
For the time being, I suppose we could simply not support opaque pointers in uc-crux-llvm
, and we could have uc-crux-llvm
always invoke Clang with the necessary options to disable opaque pointers in LLVM bitcode. Unfortunately, this is not a viable long-term strategy, as LLVM 17 will drop support for non-opaque pointers entirely. We will need to have some kind of coping strategy if excising uc-crux-llvm
of pointee types proves too difficult.
Yeah for sure!
We will need to have some kind of coping strategy if excising uc-crux-llvm of pointee types proves too difficult.
We can always replace pointer types with a pointer to an unbounded array of bytes, UC-Crux will just give you bad results (and give up often).
The original UC-symex paper had a lazy memory model that let them avoid this issue around allocation sizes, but again, that's a big project.
I think another, and possibly more sound approach would be to use the type specified at the usage points rather than the function declaration. Using the previous example:
int f(int *x) {
return x[5] + 1;
}
the resulting llvm bitcode (from clang 17 with only opaque pointers) is:
define i32 @f(ptr nocapture noundef readonly %0) ... {
%2 = getelementptr i32, ptr %0, i64 5
...
}
From the above, uc-crux-llvm should still be able to infer that the pointer can be interpreted as an array of at least 5 i32
elements and the value of that 5th element may have constraints.
The soundness aspect I referred to earlier is that if the pointer is dereferenced with different types, uc-crux-llvm should be able to identify preconditions for all of the type uses rather than just those for the declared input type.
I realize this is probably a significant extension of the current capabilities of uc-crux-llvm (or perhaps not: I haven't looked at the implementation details), but I think it is probably a reasonable path forward. Let me know if I've oversimplified or overlooked something here, however.
From the above, uc-crux-llvm should still be able to infer that the pointer can be interpreted as an array of at least 5 i32 elements and the value of that 5th element may have constraints.
Note that pointer type information was static, so UC-Crux had an idea of what shape to make arguments before executing the function. While in this simple example one could reasonably propagate the type at the use-site back up to the argument, in general that would require heap-aware data-flow analysis: for example, the pointer coming from the function argument could be stored to the heap, only later to be read out and used in some callee of a callee of the function under analysis.
We could consider dynamic tracking of pointer usages, but this gets weird if e.g., we add a symbolic offset to a pointer before casting it to a particular type.
Regardless, anything based on GEP types would only work until they get rid of GEP.
IMO the correct solution is to view memory as untyped, and to expand allocations when the function under analysis hits a memory error that indicates that an allocation was too small.
After upgrading
llvm-pretty
andllvm-pretty-bc-parser
to support LLVM's opaqueptr
type (see https://github.com/elliottt/llvm-pretty/issues/102 and https://github.com/GaloisInc/llvm-pretty-bc-parser/issues/177, respectively), we will also need to upgradecrucible-llvm
anduc-crux-llvm
to handle them.crucible-llvm
To a large extent,
crucible-llvm
is already designed to handle opaque pointers, as theLLVMPtr
type intentionally does not include a pointee type. That being said, there are still various places incrucible-llvm
that inspect pointee types of LLVM pointers, which will need to be changed to support theptr
type. Here are all the places that I have found:The
PtrType
data constructor forMemType
includes a pointee type:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/MemType.hs#L118
In the spirit of having opaque and non-opaque pointers co-exist, I think we will also need a corresponding
PtrOpaqueType
data constructor that does not have a pointee type.Similarly, the LLVM override function parser machinery includes a non-opaque pointer type, but not an opaque pointer type:
https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/QQ.hs#L48
We will want to make the latter available when the user writes
ptr
in anllvmOvr
.Speaking of overrides, we have quite a few overrides that are written with non-opaque-pointer types, such as the
memcpy
override:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Intrinsics/Libc.hs#L86
We could write a copy of each of these overrides that replaces non-opaque pointers with opaque ones (e.g.,
memcpy(ptr, ptr, size_t)
, but this would be extremely tedious to maintain. Better yet would be to tweak the override matcher functionality to accept both opaque and non-opaque pointers alike. That is, in this code:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Intrinsics/Common.hs#L256-L267
Change
(==)
(which usesType
'sEq
instance) toeqTypeModuloOpaquePtrs
.Even with this change, there are still a handful of overrides that will need special, opaque-pointer-aware versions. An example of this is
llvm.memcpy.p0i8.p0i8.i32
:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Intrinsics/LLVM.hs#L303
With opaque pointers, the name of this intrinsic becomes
llvm.memcpy.p0.p0.i32
. We could, I suppose, devise some clever string hackery to match bothllvm.memcpy.p0.p0.i32
andllvm.memcpy.p0i8.p0i8.i32
with the same override, but that might be overkill. I don't expect there will be too many intrinsics of this sort, so I think it will be fine to just make copies of them.The translation of several LLVM instructions inspects pointee types. Here are all the ones that I have found:
getelementptr
:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Constant.hs#L162-L182
load
:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs#L158-L160
https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs#L1541-L1545
store
:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs#L1557-L1561
cmpxchg
:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs#L1700-L1704
atomicrmw
:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs#L217-L219
https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/crucible-llvm/src/Lang/Crucible/LLVM/Translation/Instruction.hs#L1727-L1731
uc-crux-llvm
The largest challenge that I can foresee is figuring out how to deal with
PtrShape
:https://github.com/GaloisInc/crucible/blob/ad4a553487eeb5c6bbb5abf4bde26af905bf0254/uc-crux-llvm/src/UCCrux/LLVM/Shape.hs#L85-L88
If I'm reading this correctly,
Shape
is indexed by the pointee type to some degree, which won't work in an opaque pointer setting. That being said, I'm not quite sure how this code works, so I might be misunderstanding what is going on here.