`macaw-symbolic`: Fine-grained tracking of machine code–specific bad behavior (à la `HasLLVMAnn`)

macaw-symbolic is built on top of the crucible-llvm memory model, which means that it has the ability to report instances of C-oriented bad behavior (assuming that the underlying machine code adheres to C's memory model conventions). These instances of bad behavior are tracked via the HasLLVMAnn constraint that is threaded throughout macaw-symbolic.

In addition to C memory model checks, macaw-symbolic also adds a variety of additional assertions that are specific to machine code. These include (but are likely not limited to):

Checking if reads from and writes to the global address space are within bounds:

https://github.com/GaloisInc/macaw/blob/186c35da7c35e303ea3e06e026cdf8f0b375ecb5/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs#L172-L175 https://github.com/GaloisInc/macaw/blob/186c35da7c35e303ea3e06e026cdf8f0b375ecb5/symbolic/src/Data/Macaw/Symbolic/Memory/Common.hs#L181-L187
Using a default value when a conditional read's condition does not hold:

https://github.com/GaloisInc/macaw/blob/186c35da7c35e303ea3e06e026cdf8f0b375ecb5/symbolic/src/Data/Macaw/Symbolic/MemOps.hs#L1095-L1099
A variety of checks when performing memory operations involving pointers:

https://github.com/GaloisInc/macaw/blob/186c35da7c35e303ea3e06e026cdf8f0b375ecb5/symbolic/src/Data/Macaw/Symbolic/MemOps.hs#L502-L506
The attn signal (in macaw-ppc-symbolic):

https://github.com/GaloisInc/macaw/blob/186c35da7c35e303ea3e06e026cdf8f0b375ecb5/macaw-ppc-symbolic/src/Data/Macaw/PPC/Symbolic/Functions.hs#L82-L84
Loss of floating-point precision (in macaw-ppc-symbolic):

https://github.com/GaloisInc/macaw/blob/186c35da7c35e303ea3e06e026cdf8f0b375ecb5/macaw-ppc-symbolic/src/Data/Macaw/PPC/Symbolic/Functions.hs#L235-L246
Dividing by zero in a division-related instruction (in macaw-x86-symbolic):

https://github.com/GaloisInc/macaw/blob/186c35da7c35e303ea3e06e026cdf8f0b375ecb5/x86_symbolic/src/Data/Macaw/X86/Crucible.hs#L725-L729
A quotient overflowing in a division-related instruction (in macaw-x86-symbolic):

https://github.com/GaloisInc/macaw/blob/186c35da7c35e303ea3e06e026cdf8f0b375ecb5/x86_symbolic/src/Data/Macaw/X86/Crucible.hs#L767-L769 https://github.com/GaloisInc/macaw/blob/186c35da7c35e303ea3e06e026cdf8f0b375ecb5/x86_symbolic/src/Data/Macaw/X86/Crucible.hs#L807-L809

Unlike the checks in the crucible-llvm memory model, however, these assertions are all performed via GenericSimError or AssertFailureSimError. As a result, it is not straightforward to catch these assertions and perform subsequent analysis on them after simulation fails.

I propose that macaw-symbolic add a constraint similar to HasLLVMAnn (perhaps HasMacawAnn, for lack of a better name) and use it to track the What4 annotations of each of the terms that give rise to these assertion failures. That way, one can consult the Map of bad macaw-symbolic behaviors afterwards and match the annotations to the corresponding terms. This would require a fair bit of API churn in order to thread the new HasMacawAnn constraint through to the relevant functions, however.

For reference, here is the definition of HasLLVMAnn:

https://github.com/GaloisInc/crucible/blob/93bfa7f7858eb458f83eb3e0f030bf600ef72365/crucible-llvm/src/Lang/Crucible/LLVM/MemModel/Partial.hs#L121

The parameter ?recordLLVMAnnotation is usually used in the following way:

A Pred sym, representing a safety condition, is annotated with What4.Interface.annotateTerm
An instance of BadBehavior is constructed that represents the meaning of the safety condition
?recordLLVMAnnotation is called with these arguments, it generally stashes the (annotation, BadBehavior) pair into a MapF that lives in an IORef, where it can later be retrieved for analysis
The predicate is eventually asserted
During processing of goals, if an assertion fails and its predicate was annotated and that annotation appears in the map, tools can look up the BadBehavior from the map and analyze it.

I would actually suggest we consider supporting an API in Macaw that diverges slightly from this scheme, to potentially support more different use-cases. I have a particular proposal below, but just generally think we should consider alternatives.

We could create an enumeration (perhaps MacawError) of all (or at first, most) of the assertions that Macaw makes, and then provide an interface like so:

data MacawError sym where
  divByZero :: forall w. (1 <= w) => SymExpr (BVType w) -> MacawError sym
  -- more constructors here ...

-- | Given a safety predicate and a description of the error it represents,
-- return a new predicate (and possibly perform additional side-effects, such as
-- recording information about the predicate).
type MacawProcessAssertion sym
  = (?processAssert :: sym -> Pred sym -> MacawError sym -> IO (Pred sym))

With this scheme, the MacawProcessAssertion sym could still support the existing use-case of recording information about Pred syms to be used in later analysis, but it has more power:

It has access to the Pred sym itself, not just an annotation on it, so
- it has the ability to not annotate the predicate if it doesn't want to
- it can observe, e.g., if the Pred is a constant before deciding what to do next
It can change the predicate, e.g., to skip making certain classes of assertions by returning truePred sym

... and potentially other use-cases not yet envisioned here. A default implementation might be:

defaultMacawProcessAssertion _sym p e = pure p

Or, to support the existing use-case (not 100% sure this type-checks):

defaultMacawProcessAssertion mapRef sym p e = do
  (n, p') <- annotateTerm sym p
  modifyIORef mapRef (MapF.insert n e)
  pure p'

GaloisInc / macaw

`macaw-symbolic`: Fine-grained tracking of machine code–specific bad behavior (à la `HasLLVMAnn`) #429