Challenge 10 Interactive

Interactive Verification

Adds support for interacting with the verifier from the debugging interface. The interaction is limited to multiple-choice, and simply works by having the verifier register handles that get executed when the debugging interface navigates to certain trace nodes.

There are effectively two kinds of decisions: blocking and non-blocking.

Blocking. A blocking decision halts verification until the user provides some input. When the interface is in wait mode (i.e. printing live output from the verifier), any blocking decision present in the trace tree will cause the interface to stop waiting and navigate to the corresponding element, prompting the user to provide a decision.
Non-blocking. A non-blocking decision exists in the trace tree alongside other debugging elements, but has no effect until the user navigates to one of the available choices. This will cause an action to be queued that will be processed at the next available point during verification. Currently there are three places that non-blocking decisions can be executed:
- before any node has been selected for processing (a "queue" action)
- after a node has been selected and its equivalence domain assumed (a "node" action)
- after a node has been selected and a specific exit case is being considered (and the path condition for that exit case is assumed) (an "edge" action)

Example Blocking Decisions

When a control flow desynchronization is encountered, either a synchronization point must be provided or the desynchronization must be asserted/assumed to be infeasible.
When verification has finished, the user has several choices for how to proceed: either to modify the proof graph and continue verification, or to finalize the proof and see the overall result.

Example Non-blocking Decisions

Refine a proof node by asserting or assuming a condition (given by asserting/assuming that some set of registers are equal between the original and patched programs), and generating a strengthened equivalence domain under this condition.
Simplify the predicate for a condition using a collection of rewrite rules and the SMT solver
Prioritize a node to be processed as soon as possible
Unsafely drop all entries currently in the worklist

See:

./src/Pate/TraceTree.hs
- chooseLabel, choose, chooseBool - blocking choices
- chooseLazy - non-blocking choices
./tools/pate-repl/Repl.hs
- isBlocked
- asChoice, asChoiceTree
- goto_status

Equivalence Domain Refinement / Conditions

As a non-blocking decision (see above), allows users to refine an equivalence domain by adding an additional condition to a proof node. Currently this is done by presenting the user with a list of registers that are not known to be equal in the domain, and allowing them to select any subset of them. The condition that is generated simply states that those registers are, in fact, equal in the post-state of the relevant node.

The equivalence domain is re-generated while assuming this condition, thus strengthening it and proving more locations (i.e. the provided set of registers) to now be equal between the original and patched programs.

Condition Propagation

A condition is propagated (either by default or by request from the user) from a node to its ancestors by computing and adding a condition to each ancestor that is sufficient to imply the condition of the current node. As this adds more conditions to the graph, all descendant nodes from these ancestors must be invalidated and re-checked after a propagation step.

If the current assumptions for a node are already sufficient to imply the propagated condition, then no propagation occurs.

Assertions vs. Assumptions vs. Equivalence Domains

There are three classes of conditions, which are separately propagated, although they are all assumed to hold when analyzing a node.

Assertions
- These are always propagated. It is considered an error if an assertion is propagated to the entry point (i.e. assertions must be discharged at some point and no longer propagated)
Assumptions
- These are not propagated by default, but users can request that they are either propagated once (to the immediate ancestors of the node) or fully (to the entry point of the program)
- Unlike assertions, assumptions may propagate to the entry point
- These are intended for properties that should be provable, but may require additional analysis
Equivalence Conditions
- These are treated the same as assumptions, but are kept formally distinct
- Unlike assumptions, these are not intended to be provable, but instead they serve as the condition that must be assumed for the binaries to be considered equal
- The final analysis step reports any equivalence conditions that were assumed

See:

./src/Pate/Verification/StrongestPosts.hs
- addRefinementChoice
- addImmediateEqDomRefinementChoice
./src/Pate/Verification/Widening.hs
- refineEquivalenceDomain
- pickRegisters

Control Flow Desynchronization and Resynchronization

Refines the process of defining desynchronization and resynchronization points considerably. Notably:

Synchronization points may be defined as any intermediate instruction, rather than only block entry points. The ./src/Pate/Discovery/ParsedFunctions.hs module adds additional intra-block "cut" points for any intermediate instructions defined as synchronization points, to ensure that (for all subsequent code discovery) the sync points are considered the start of blocks.
All unhandled control flow desynchronizations may be handled by defining a synchronization point, or by asserting/assuming that the desynchronization is infeasible.

See:

./src/Pate/Verification/StrongestPosts.hs
- chooseSyncPoint
./src/Pate/Discovery/ParsedFunctions.hs
- addExtraTarget

Entry Point Selection

Adds support for multiple entry points to be defined, and prompts the user at launch to pick an entry point to start the analysis from.

Stubs

Adds a collection of stubs for libc calls as well as miscellaneous other external library calls. Some of these have partially-correct definitions (i.e. write() emits an observable event based on the contents written), while most have placeholder implementations. These placeholders simply write arbitrary (but equal, between the original and patched programs) values to r0 to represent some unknown value being returned from the stub.

See:

./arch/Pate/AArch32.hs
- stubOverrides
./arch/Pate/PPC32.hs
- stubOverride
./src/Pate/Verification/StrongestPosts.hs
- handleStub

Node scheduling

Adds fine-grained scheduling support for proof nodes. The priority of a node has two components:

the reason that the node was scheduled, determining the "kind" of the priority. This is simply a datatype with constructors corresponding roughly to the high-level reasons that a node may be schedule (e.g. because of an explicit user request, or because its ancestor had its domain widened, etc)
an integer corresponding to the individual priority of node. This value is inherited from the node that was being processed when this node was scheduled.

The schedule reason/kind for a node is informal, and has no effect on the analysis of the node, aside from its scheduling priority. Nodes are selected first by the highest-priority kind in the work list, and then by the highest priority node within that kind.

The overall goal with node scheduling is to reduce redundant analysis steps. For example, when the user asserts or assumes an equivalence condition for a node, its descendants must be invalidated and re-generated. However we should ensure that all relevant conditions have added before handling this invalidation, to avoid performing it multiple times.

See:

./src/Macaw/RevMap.hs
- An implementation of Data.Map that also supports reverse lookups.
- Implements the worklist to easily find the highest-priority node
./src/Pate/Monad.hs
- atPriority, currentPriority, thisPriority
./src/Pate/Monad/Environment.hs
- NodePriority
- *priority functions
./src/Pate/Verification/PairGraph.hs
- queueNode, queueAncestors
./src/Pate/Verification/StrongestPosts.hs
- usages of queueNode and queueAncestors
./src/Pate/Verification/Widening.hs
- usages of queueNode and queueAncestors

Improved ARM Thumb mode support

Uses the calling context to determine the state of the PSTATE.T register following a function call. Specifically, we assume (by convention) that this register is restored when a function returns.

Macaw Hacks

The value of the PSTATE.T register is readily available during the analysis, however it is slightly non-trivial to supply this information to Macaw. Macaw takes an Data.Macaw.Architecture.Info as input during its code analysis phase (see: ./src/Pate/Discovery/ParsedFunctions.hs), which defines various functions for extracting architecture-specific state. This is already used to supply concrete values that are derived from symbolic execution (this incidentally helps resolve the PSTATE.T register in many cases).

Before each code analysis call, a modified Architecture.Info is supplied to macaw with specialized variants of these architecture-specific function, based on the current context of the analysis. We therefore additionally supply the code discovery with a modified extractBlockPrecond function (used by the ARM semantics to use determine the initial value of the PSTATE.T register for a block) that also considers the calling context. This injects the assumption that PSTATE.T is necessarily stable for the duration of an entire function call, when control flow is inside of that function.

Using symbolic execution to classify jumps

In cases where macaw fails to classify a block exit, adds a fallback to use symbolic execution to determine if a block terminates with an unconditional jump to a concrete address (based on the assumptions in scope). This is done on-demand, based on whether or not symbolic execution decides that it's possible for a block to end in a classification failure.

Notably this may incorrectly classify a function call as a jump, in which case the verifier will simply end up effectively in-lining the target function.

This approach overlaps with macaw-refinement considerably, so using this should be investigated before developing this strategy further.

See:

./src/Pate/Discovery/ParsedFunctions.hs
- addExtraEdges
./src/Pate/Discovery/PLT.hs
- extraJumpClassifier

Using provide hints to classify stubs

Adds an extra classifier that uses the provided hints to determine if a block terminates with a call to a stub. If the terminal PC is concretely a known stub, then the result is classified as a function call that returns to the subsequent address.

This could be refined slightly by allowing stub definitions to override how the return target is determined.

See:

./src/Pate/Discovery/PLT.hs
- pltStubClassifier

Stripping unsupported instructions from input

If symbolic execution fails (i.e. due to an undefined/uninterpreted instruction), adds a post-processing step which simply strips any uninterpreted statements from the block before symbolic execution. Additionally, adds an override to the disassembler to discard any unknown instructions that are encountered, and simply fall through to the next instruction.

Both of these cases raise a warning, but allow the analysis to (unsoundly) continue in the presence of unsupported instructions.

Error Handling

Adds an additional error handling mode that doesn't throw variable "rescoping" errors. These happen when propagating an equivalence domain across a node, and the output can't be rephrased (using the existing heuristics).

This is unsound in general, but can be useful for keeping the number of errors thrown to a minimum during experiments.

See:

./src/Pate/Verification/SymbolicExecution.hs
- stripUninterpreted, hasUninterpreted
./src/Pate/Discovery/ParsedFunctions.hs
- addTranslationErrorWrapper

Hints/Metadata

Experimental support for manually loading extra sections from the binary. (see: ./src/Pate/Loader/ELF.hs)

BSI:

Adds candidate_source_match as a key for defining a function symbol. (see: ./src/Pate/Hints/BSI.hs)

Expression Simplification

Adds additional cases to the bitvector simplifier in ./src/What4/ExprHelpers.hs to decompose equality tests across concatenated bitvectors (i.e. a ++ b = c ++ d --> a == b && c == d).

Globally-unique Uninterpreted Functions

Adds ./src/What4/UninterpretedFns.hs, which provides support for defining uninterpreted functions which are globally-unique, keyed on name and signature. This is used to resolve issues that arise from creating arbitrary free values in under-specified stub implementations. Instead, arbitrary values are always defined as some uninterpreted (but consistent) function of some inputs.

Note: This could likely be used to considerably simplify ./src/Pate/Memory/MemTrace.hs by removing the need to carry around a handle for creating uninterpreted functions to represent undefined memory operations.

See:

./src/What4/UninterpretedFns.hs
- mkUninterpretedSymFn
./src/Pate/Arch.hs
- mkObservableOverride
- mkWriteOverride

Submodules

ASL Translator

Bumps the asl-translator submodule to 3d52d9199afa913960940cc76ee7db19e63c9b98 (dm/pate branch). This includes two critical fixes:

Fixes conditional blocks in thumb mode by incrementing the PSTATE.IT register after each instruction
Fixes setSlice semantics, which were simply incorrect and likely fixes many yet-undiscovered issues

Macaw

Bumps macaw submodule to 0818bd5eb76949917b88cb6ea971d475ee7f5e21 (dm/pate branch). This includes two critical fixes:

ARM disassembler now respects the given size limit (needed for fine-grained block slicing)
Adds additional case for return classifier

The other changes in this branch can likely be discarded, so the above fixes should be cherry-picked out.

GaloisInc / pate