GaloisInc / pate

Patches Assured up to Trace Equivalence
Other
15 stars 2 forks source link

Define a special-purpose datatype for communicating expressions about symbolic state #425

Closed danmatichuk closed 2 days ago

danmatichuk commented 1 month ago

Currently symbolic expressions (e.g. equivalence conditions) are sent to the GUI/repl as JSON-serialized/pretty-printed what4 expressions. These are a bit awkward to use for a number of reasons: the what4 representation throws away some structure that we might want (e.g. converting A || B into ~(~(A) && ~(B))); there isn't an obvious connection between formal variables and machine state; lots of explicit type conversions are necessary (i.e. bvToInteger) in order to keep the expressions well-typed, making them verbose. The most significant issue, however, is that we don't have a mechanism to take predicates supplied from the front end (i.e. a path constraint) and interpret them with respect to a particular symbolic state.

To address this we can add a special-purpose expression language that's designed to facilitate back and forth communication of symbolic predicates, that simply makes registers and memory first-class concepts. The tricky part is that, in general, we would like to be able to have expressions that talk about intermediate states within a CFAR. Formally it should be possible to uniquely identify each intermediate state within a CFAR by exactly its PC value, since each instruction will be executed at most once. Unfortunately Macaw is fairly aggressively throwing away information about intermediate register states during symbolic execution, so there will be a bit of work required to put in the needed plumbing in order to recover this.

danmatichuk commented 1 month ago

As a first approximation we can simply define the semantics of expressions to always be with respect to the initial state of some associated CFAR. This simplifies the representation at the cost of some loss of precision. So for the following CFAR:

0xabc0:
    write_mem32(r2, 41)
0xabc4:
    r3 <- read_mem32(0xdeadbeef)

We can write an expression saying read_mem32(0xdeadbeef) >= 42, but there's no guarantee that this actually refers to the final value of r3 (i.e. in the case where r2 == 0xdeadbeef we know that r3 necessarily equals 41 due to the preceding write).

danmatichuk commented 1 month ago

There are two obvious ways to relate expressions back to a corresponding intermediate (symbolic) program state: we tag each individual read/register access with its associated intermediate state (i.e. instruction address), or we define expressions to always be interpreted with respect to a single intermediate state.

For now the latter choice seems more straightforward and should still be expressive enough to represent any feasible intermediate symbolic state, however in full generality it is certainly possible to define machine semantics (in macaw) which cannot be represented this way. For example: writing to memory and then reading from memory in a single instruction. My hypothesis is that in practice this doesn't happen (maybe ARM vector instructions?) so it doesn't make sense to pre-emptively design for it.

danmatichuk commented 2 days ago

The current approach involves sending abstract identifiers to the front end which it sends back in a simple form (i.e. identifier, relation, constant). This is sufficient for now in lieu of a more robust constraint language, so we'll close this issue until we decide that we want to extend this feature.