implement a fast (but imprecise) memory model

dmwit commented 3 years ago

Currently, our precise memory model is quite slow. It's possible that one reason for this is that it needs to check quite a lot of sanity conditions on every read and write, and then these conditions become part of downstream terms as well. One idea we've had for addressing this is to implement a second memory model to act as a fast path. We've had two ideas for how to do this:

Instead of checking that the regions and pointers we're extracting bits from match up, we could make up a fresh, uninterpreted "nonce" on each write. When reading, we simply check that the nonces for all bytes read are exactly equal. If they are, this model succeeds; otherwise we fall back to the more precise model.
Split the memory model into two pieces. One piece does the reads and writes and assumes all the sanity checks from the precise model pass. As this piece does the reads and writes, it gathers up side conditions which would be sufficient for that assumption to be justified; separately, we ask the prover for counterexamples to the side conditions. If we find some, then fall back to the slower model.

dmwit commented 3 years ago

Tristan, Dan, and I had a discussion about this. We discussed two broad-stroke plans.

Implement two memory models, one which produces large symbolic terms with lots of checks and one which does fewer checks and produces smaller symbolic terms. We ask the SMT solver questions using the small symbolic terms, then later try to prove that the large and small terms are equivalent.
Implement one memory model that records two pieces of information, a small symbolic term with no checks, and a separate symbolic term describing the conditions under which the small term is correct. We ask the SMT solver questions using the small terms, then later try to prove the side conditions.

We think that implementing (2) will be simpler, but that it's possible that sometimes the equivalence check in (1) could pass even though the side conditions recorded in (2) would not. We are going to optimistically hope that (2) succeeds often enough to be useful, and proceed with that.

Given that we are implementing (2), we had three ideas about how to record side conditions.

a. Add a Pred sym to the memory model. Build it up as the memory transaction proceeds, producing a larger and larger conjunction of conditions. b. Add a Pred sym to each individual memory operation. When we need to check it, traverse the log of memory operations and conjoin them. c. Use crucible's built-in condition tracking.

We think (c) may be tricky to get exactly right, because we will want a way to separate out conditions recorded by the memory model from conditions recorded by other parts of our semantics, and whatever mechanism we come up with for that would need to be future-proof. Choices (a) and (b) seem very similar; (b) might give us a little bit more introspection power (e.g. for debugging), while (a) is somewhat more convenient for consumers of the memory model. We are going to proceed with (b).

travitch commented 3 years ago

I'm aligned with those choices. I definitely prefer (b) in terms of making it a bit easier to determine the provenance of each Pred. Do we need to conjoin them when checking, or can they just be checked independently?

dmwit commented 3 years ago

So I've been thinking about that question for a little bit now and tying myself in knots. So I'm not sure. I think maybe you can check them separately, with some sort of path-condition-like caveat that when checking later operations you want to assume that all the previous operations' predicates are true.

dmwit commented 3 years ago

During the discussion today, we considered again some alternative implementation strategies. One was to return to storing a flat array of bytes and reading/writing as in the old style. After some discussion, we decided that this would be more work than pushing the current approach through to completion.

lcasburn commented 3 years ago

Status 10/5:

Implementation complete but performance on microbenchmark is slower (2.5min) than original (2min).
Performance of microbenchmark using concrete region yields 14sec run. Think slow down might be in resolving pre-domain heuristics.
Next: Instrument and identify where performance issue is occurring.

dmwit commented 3 years ago

Status 10/13:

Performance exploration was somewhat inconclusive. Dan suggests looking for occurrences of lazyProofEvent, forkProofEvent, and friends and instrumenting those.
For both the optimistic and sound memory models, there is a huge performance difference when using the "simple frames" heuristic (this heuristic slows pate-exec down dramatically). The follow-up question is: does the memory model perform well enough without this heuristic on the challenge problems? Tristan suggests challenge problem 3, with patch variant 2, as a good case to try for answering this question, with a target time of 5-10 minutes.

dmwit commented 2 years ago

A few updates:

The optimistic memory model is hopelessly slow even for challenge problem 3 -- about 2 hours runtime on my machine vs. ~5 minutes with the old memory model.
Dan and I spent some time trying to figure out what's so slow. We have been using test-write-then-read with --try-simple-frames as a case study for this, which at the moment times out in the solver (i.e. takes >300s for some queries). We discovered that commenting out the stacksEq check in statePredPost brings the solver times down to <2s for all queries, so the fault likely lies in checking that the final stacks are equivalent between the two programs. We're not sure why yet.

danmatichuk commented 2 years ago

One suggestion that was made was to try changing the underlying representation of the region to a bitvector (or possibly even a boolean to indicate simply stack vs. global memory) rather than an integer to avoid involving both theories.

travitch commented 2 years ago

The expand on that point - mixing the Integer and Bitvector theories can often lead to very poor performance. The LLVM memory model mostly avoids this because region IDs end up being concrete due to them being stored in the write log. The previous pate memory model also mostly avoids it because every read pointer was given a concrete 0 region. The working hypothesis that we can test is that the mixing of integer and bitvector terms is likely bad for the solver.

If that is the case, we could just always use bitvectors

dmwit commented 2 years ago

Is the thought that we'd still be using LLVMPointers, but converting regions between integers and bitvectors before/after storage? Won't that still be mixing theories? If switching away from LLVMPointers, then I understand a bit, but that seems like a more involved change for sure.

travitch commented 2 years ago

I was wondering if we could just grab the smtlib from the query we identified as slow and sed the Integers into Bitvectors for the test

dmwit commented 2 years ago

Ah, that makes a ton of sense, thanks.

travitch commented 2 years ago

It sounds like Integer -> BV did not help

lcasburn commented 2 years ago

Options to explore next:

Target different solvers
Back off this solution and try simpler model
- simple mem model + additional structure for region
- populate region with reads
- If tests pass, find places where new model breaks abstraction boundary for later refinement

--- Other hypotheses ---

Break Monolithic query into smaller goals
- Generate goal for each location and join. Don't join and see if there's a single goal that's difficult.
Collection of assumptions may be causing the solver to explore unproductive search space
Are we preserving sharing that exists?
- term munging could be breaking sharing in solver. Look for cases where new mem model does something different from old memory model. If it lost sharing, resulting formula usually gets larger.
Forcing solver to think about overlap could be slowing mem model

dmwit commented 2 years ago

I spent a little time last night on (1). It's not so simple as it appears: CVC4 and Z3 both finish very, very quickly on CP3. But they also don't actually seem to check any interesting conditions -- the verification outputs I get out are smaller even than the ones yices gives me for the simple write-then-read test. I spent a little time digging into this trying to work out why, but didn't really make progress.

I'm going to advance to (2).

travitch commented 2 years ago

Yikes, that is alarming. I just played with it a bit and it definitely looks like the verifier gives up early without any diagnostics when you use z3 and cvc4. I wonder if we are accidentally swallowing some critical solver failures (e.g., parse failures)

dmwit commented 2 years ago

I have checked in a starter attempt at (2) under the wip/memtrace-regions branch. Currently it fails to verify that test-just-exit.original.exe is equivalent to itself, so something is definitely wrong. I spent a few hours trying to work out what, but more is needed.

An audit of places that import the memory model made me squint twice, once at bindMemory in Pate/Verification/Domain.hs:106 (which mentions the array of byte values but not the array of region values), and once at flatVarBinds in Pate/SimState.hs:418, which binds a variable to the array of byte values but not any to the array of region values. I don't have enough insight to see what that would break, but it might be one place to start next investigations.

travitch commented 2 years ago

@danmatichuk Could you comment on what bindMemory and flatVarBinds are trying to accomplish? I'd like to understand at a high level so we can figure out how to abstract these interactions to not require this code to dip into the details of the memory model. Presumably both of those call sites have some sensible names and semantics that we can provide as an interface to the memory model without exposing internals.

I tried to understand bindMemory, but its use is a bit mysterious to me. It looks a bit like it is called from a context that is allocating a fresh empty memory model and asserting that it has equal contents to the one the verifier already has on hand.

dmwit commented 2 years ago

On the topic of abstraction: there are two other places outside the model that had to dip into implementation details. The first is memPredPre in src/Pate/Equivalence.hs:411; I think that could be modified to use the conditional write API that already exists. The second is guessMemoryDomain in src/Pate/Verification/Domain.hs:165; perhaps the right abstraction here is to provide a symbolic equality check between two memory models. Some thought would need to be given to exactly what that means -- e.g. here the trace of operations isn't checked, only the final arrays, but it seems reasonable to think that the variant that does check the traces would be what was wanted in some other situation.

travitch commented 2 years ago

Thanks - I started a ticket to track these. I added one I was looking at, and think I'll have some more soon.

GaloisInc / pate

implement a fast (but imprecise) memory model #97