GaloisInc / macaw

Open source binary analysis tools.
BSD 3-Clause "New" or "Revised" License
199 stars 20 forks source link

`macaw-symbolic`: Fine-grained tracking of machine code–specific bad behavior (à la `HasLLVMAnn`) #429

Open RyanGlScott opened 3 weeks ago

RyanGlScott commented 3 weeks ago

macaw-symbolic is built on top of the crucible-llvm memory model, which means that it has the ability to report instances of C-oriented bad behavior (assuming that the underlying machine code adheres to C's memory model conventions). These instances of bad behavior are tracked via the HasLLVMAnn constraint that is threaded throughout macaw-symbolic.

In addition to C memory model checks, macaw-symbolic also adds a variety of additional assertions that are specific to machine code. These include (but are likely not limited to):

Unlike the checks in the crucible-llvm memory model, however, these assertions are all performed via GenericSimError or AssertFailureSimError. As a result, it is not straightforward to catch these assertions and perform subsequent analysis on them after simulation fails.

I propose that macaw-symbolic add a constraint similar to HasLLVMAnn (perhaps HasMacawAnn, for lack of a better name) and use it to track the What4 annotations of each of the terms that give rise to these assertion failures. That way, one can consult the Map of bad macaw-symbolic behaviors afterwards and match the annotations to the corresponding terms. This would require a fair bit of API churn in order to thread the new HasMacawAnn constraint through to the relevant functions, however.

langston-barrett commented 3 weeks ago

For reference, here is the definition of HasLLVMAnn:

https://github.com/GaloisInc/crucible/blob/93bfa7f7858eb458f83eb3e0f030bf600ef72365/crucible-llvm/src/Lang/Crucible/LLVM/MemModel/Partial.hs#L121

The parameter ?recordLLVMAnnotation is usually used in the following way:

I would actually suggest we consider supporting an API in Macaw that diverges slightly from this scheme, to potentially support more different use-cases. I have a particular proposal below, but just generally think we should consider alternatives.

We could create an enumeration (perhaps MacawError) of all (or at first, most) of the assertions that Macaw makes, and then provide an interface like so:

data MacawError sym where
  divByZero :: forall w. (1 <= w) => SymExpr (BVType w) -> MacawError sym
  -- more constructors here ...

-- | Given a safety predicate and a description of the error it represents,
-- return a new predicate (and possibly perform additional side-effects, such as
-- recording information about the predicate).
type MacawProcessAssertion sym
  = (?processAssert :: sym -> Pred sym -> MacawError sym -> IO (Pred sym))

With this scheme, the MacawProcessAssertion sym could still support the existing use-case of recording information about Pred syms to be used in later analysis, but it has more power:

... and potentially other use-cases not yet envisioned here. A default implementation might be:

defaultMacawProcessAssertion _sym p e = pure p

Or, to support the existing use-case (not 100% sure this type-checks):

defaultMacawProcessAssertion mapRef sym p e = do
  (n, p') <- annotateTerm sym p
  modifyIORef mapRef (MapF.insert n e)
  pure p'