GaloisInc / pate

Patches Assured up to Trace Equivalence
Other
15 stars 3 forks source link

Provide list of variable names & types to front-end #424

Closed lcasburn closed 2 months ago

danmatichuk commented 3 months ago

As per the discussion today: we're going to slightly generalize this to instead have the verifier construct a list of expressions that are extracted from a particular trace, and use those as the basis for introducing constraints. This just means that the front end is agnostic of how that list gets populated. For example in general the verifier might decide that read_mem32(read_mem32(r3)) is a value that the user might choose to constrain.

For a first cut of this feature, we'll just include: the initial value of any registers that are accessed and any reads from concrete memory addresses (i.e. addresses that are necessarily concrete with respect to the current set of assumptions, including any constraints supplied by the user). So the list of expressions would look like:

[r0, r3, read_mem32(0xdeadbeef)]

As a second cut, we can consider annotating intermediate values in the trace with unique identifiers to facilitate adding constraints to them.

For example, we might have a symbolic trace that looks like:

0xabc0:
    r4 <- read_mem32(0xdeadbeef)
0xabc4:
    r5 <- r3 + r4

Which yields a concrete trace that looks like:

Initial Register Values:
r3 <- 2

Trace:
0xabc0:
    read_mem32(0xdeadbeef) -> 42
0xabc4:
    r5 <- 44

If we want to add a constraint that says r5 == 60, we can't simply assert that about the initial value of r5. Instead we need a constraint on the expression: r3 + read_mem32(0xdeadbeef).

To facilitate this we'll want to add a unique identifier to indicate the value of r5 at a particular address, so that the user can simply say r5@0xabc4 rather than needing to be aware of its internal symbolic representation. The verifier can then send just these identifiers to the front end, and knows how to turn them back into formal predicates.

So given this extension we might see this as a list of candidate expressions:

[r3, read_mem32(0xdeadbeef)@0xabc0, r4@0xabc0, r5@0xabc4]

jim-carciofini commented 3 months ago

Sounds good. But I need type for each expression too. Perhaps a list of expression, type pairs.

thebendavis commented 3 months ago

@danmatichuk IIRC we do some simplification in how we display equivalence conditions to users through some mix of simplification on the verifier side plus some more simplification/rewriting in the Binary Ninja plugin. Does the feature described in this issue relate to the (verifier) simplifier in any way? E.g. if the user wants to constrain something about the value of a 32-bit memory read, might we need to somehow translate this into constraints on the low level 4x single-byte reads we simplified by grouping into the 32-bit read in order to generate the updated concretized trace, or is that memory access grouping happening at a different level to the trace-gen constraint stuff here?

danmatichuk commented 3 months ago

See #425 for some more thoughts on the expression language itself, but in short: yes I think the idea would be that this expression language (i.e. "Pate" expressions) serves as an intermediary for all three purposes: sending the list of "interesting" values to the front end as candidates for constraints, taking the list of provided constraints from the front end and turning them back into formal predicates over the symbolic state, and presenting equivalence conditions/assertions/assumptions to the user.

My thinking is that the "pretty" simplification steps that we're doing on what4 expressions are exactly the transformation steps that we want to do to instead convert a what4 expression into a Pate expression (i.e. wrapping raw array accesses in a "read" function). This would then serve as a common layer to push some of the post-processing simplification steps that the GUI is doing down into the verifier.

danmatichuk commented 3 months ago

See #426 for a simplified approach that avoids the full generality of defining an intermediate expression language.

jim-carciofini commented 3 months ago

Given the short time, it would be best if you can minimize changes to any of JSON sent to the front end. Adding structure should be fine. But other changes take time we don't have to resolve.