Concretizing LLVM memory

When an SMT solver yields unsat, it generally provides a model, i.e., an instantiation of all of the variables in the query. Given a model, we can "concretize" Crucible values (RegValues), replacing symbolic terms with concrete values (#1207). This enables us to do things like report concrete variable assignments that lead to violations of safety conditions. Concrete values are significantly easier to read and understand, enabling users to more quickly grasp why a proof/simulation fails.

For the same use-cases, it would be nice to develop a similar capability for LLVM memories. I started down this path only to realize that there is at least one tricky design question that needs to be resolved: The representation of pointers in the concrete memory.

Background

In the Crucible-LLVM memory model (permalink),

Each allocation in the memory model is completely isolated from all other allocations and has a unique block identifier. Pointers into the memory model have two components: the identifier of the block they reference and an offset into that block. Note that both the identifier and the offset can be symbolic.

Said another way, the memory model is two-dimensional: One "axis" of a pointer determines which allocation (possibly allocations in the case of a symbolic block identifier) it points to, and the other "axis" is the offset within that allocation.[^bv] For example, the null pointer is represented by the pair (0, 0x0).

[^bv]: Non-pointer bitvectors (e.g. the 5 in int x = 5;) are represented the same way as pointers, but with a block identifier that’s concretely 0.

A first attempt

To situate the problem, consider the following types I drafted up to represent concretized memory (simplified for presentation):

data MemByte
  = Initialized Word8
  | Invalidated Text
  | Uninitialized

data ConcAllocStorage
  = UnboundedStorage (Seq MemByte)
  | BoundedStorage (MVector RealWorld MemByte)

data ConcAlloc
  = ConcAlloc
    { concAllocType :: AllocType
    , concAllocStorage :: ConcAllocStorage
    -- plus alignment, mutability, etc...
    }

data ConcMem = ConcMem (IntMap ConcAlloc)

I ran into the problem when trying to handle stores (MemStore) of pointers:

boundedMemStore ::
  W4GE.GroundEvalFn t ->
  BV.BV w ->
  MemVal.LLVMVal sym ->
  MemType.StorageType ->
  MVector RealWorld MemByte ->
  IO ()
boundedMemStore gFn off val ty v = do
  case val of
    MemVal.LLVMValInt blk off -> _  -- What to put here?

If the (concretized) block number blk is 0 in the model, then we could just write out the bytes of the (concretized) offset off. But what if it's not? There is no word that contains as much information as the pointer (because the pointer consists of a word-size offset and a block number), so we appear to be stuck.

Some options

So how can we work around this?

(1) Define a representation of pointers as (concrete) bytes: We could associate each block number with an arbitrary base address (likely chosen to be reasonably similar to actual heap addresses on x86_64 Linux). When writing pointers to concrete memory, we would look up the base address and add the offset, and write that word. This has the virtue of simplicity and familiarity - it is actually how pointers work, after all. We would need to be careful when assigning base addresses to avoid seemingly-overlapping allocations, and it's not clear how that would work with Crucible-LLVM's "unbounded" allocations (but perhaps that's an edge case that's not work optimizing for). This has all the disadvantages of real pointers, i.e., some bit-patterns would look like pointers, but might not necessarily have been derived from the pointers they resemble.

(2) Opaque pointer bytes: We could consider changing MemByte to somthing more like this:

data MemVal w
  = Initialized (Vector Word8)
  | Invalidated Integer Text
  | Uninitialized Integer
  | Ptr (ConcLLVMPtr w)

data ConcAllocStorage w = Seq (MemVal w)

This comes at the cost of both performance and normalization. We can no longer use fixed size arrays for storage, nor index into them as arrays of bytes. Furthermore, ConcAllocStorage could contain two adjacent Uninitializeds (or Initializeds), and so have multiple representations of the same data.

(3) Duplicated opaque pointer bytes: We could instead add a constructor to MemByte like so:

data MemByte w
  = Initialized Word8
  -- snip --
  | PtrByte (ConcLLVMPtr w)

When writing pointers to concrete memory, we would write N of these opaque PtrBytes, where N is the size of a pointer. This has the disadvantage of storing (a pointer to, haha) the concrete pointer several times, but has the advantage of retaining some nice performance characteristics (i.e., the ability to treat BoundedStorage as an array of bytes). It also comes at the price of a bit of denormalization: What would it mean to have fewer than N of these in a row in some allocation? Or adjacent PtrBytes with different ConcLLVMPtrs?

(4) Your solution here: Anyone have a better idea? Or opinions about these options?

I'm afraid I don't understand the premise of this issue, so I'm not yet able to offer an informed answer to your questions.

First, a very dumb question: what is the relationship between the functionality you are trying to implement and concMem?

data ConcMem = ConcMem (IntMap ConcAlloc)

What does the range in this IntMap represent?

boundedMemStore ::
  W4GE.GroundEvalFn t ->
  BV.BV w ->
  MemVal.LLVMVal sym ->
  MemType.StorageType ->
  MVector RealWorld MemByte ->
  IO ()
boundedMemStore gFn off val ty v = do
  case val of
    MemVal.LLVMValInt blk off -> _  -- What to put here?
If the (concretized) block number blk is 0 in the model, then we could just write out the bytes of the (concretized) offset off. But what if it's not? There is no word that contains as much information as the pointer (because the pointer consists of a word-size offset and a block number), so we appear to be stuck.

I don't understand this part at all. What exactly is boundedMemStore trying to do? Why do we need to write out both the offset and block number? What happens if we don't? (I feel like I'm missing some important context for understanding the motivation here.)

(I feel like I'm missing some important context for understanding the motivation here.)

The overall motivation is making it easier to understand what went wrong when a safety assertion failed. Concretizing RegValues can help with this, but I find the log-of-writes memory printout to be pretty confusing. I was hoping to develop a representation for a concretized version of memory that would be more comprehensible.

First, a very dumb question: what is the relationship between the functionality you are trying to implement and concMem?

This is a very good question! concMem has the signature

https://github.com/GaloisInc/crucible/blob/19e9a5243e73f639baec061faf0a243b03401d69/crucible-llvm/src/Lang/Crucible/LLVM/MemModel/MemLog.hs#L758-L762

It goes through a Mem sym, concretizing all of the symbolic values in it, and then re-injecting them back into RegValues to put them back in the Mem. In contrast, I want to develop a simpler representation of concretized memory that makes it easy to understand the actual, concrete values at each location in memory. In particular, I was hoping to avoid a log-of-writes-like representation, in favor of something more like real program memory (i.e., a bunch of allocations, which themselves each hold what is basically an array of bytes).

What does the range in this IntMap represent?

Right, so this was my first stab at a simpler representation. The keys are block numbers, and the values are the allocations they correspond to.

I don't understand this part at all. What exactly is boundedMemStore trying to do? Why do we need to write out both the offset and block number? What happens if we don't? (I feel like I'm missing some important context for understanding the motivation here.)

Yeah sorry, this was a bit in the weeds. This function is trying to take a symbolic memory write (a MemStore), and perform the equivalent operation in the concretized memory. This is how I was constructing the concretized memory - starting from an "empty" representation, walking over the Mem, and performing equivalent operations on the concretized memory.

I find the log-of-writes memory printout to be pretty confusing. I was hoping to develop a representation for a concretized version of memory that would be more comprehensible.

I see! So you're trying to develop a simplified version of memory—got it.

In order to answer your questions better, it might help to list the reasons why you find the log-of-writes Mem representation to difficult to read and understand (even when all of the values in it are concrete). You list several different alternatives to Mem above, but given that these are all hypothetical designs, it's difficult for me to evaluate the merits of each one without a sense for what specific issues you are trying to address.

In order to answer your questions better, it might help to list the reasons why you find the log-of-writes Mem representation to difficult to read and understand (even when all of the values in it are concrete).

Sure. And to be clear: the problem is not necessarily that I personally find it difficult to read and understand (although this is probably true, and this is how I framed it above), but rather that I would assert that users of Crucible-based tooling are likely to find it obtuse.

Consider the following example of a pretty-printed log-of-writes memory:

  Stack frame deref
    Allocations:
      StackAlloc 186 0x8:[64] Mutable 8-byte-aligned internal
    Writes:
      Indexed chunk:
        186 |->   *(186, 0x0:[64]) := (184, 0x0:[64])
  Branch frame
    No writes or allocations
  Branch frame
    No writes or allocations
  Branch frame
    No writes or allocations
  Stack frame test
    Allocations:
      StackAlloc 185 0x4:[64] Mutable 4-byte-aligned internal
      StackAlloc 184 0x4:[64] Mutable 4-byte-aligned internal
      StackAlloc 183 0x4:[64] Mutable 4-byte-aligned internal
      StackAlloc 182 0x4:[64] Mutable 4-byte-aligned internal
    Writes:
      Indexed chunk:
        183 |->   *(183, 0x0:[64]) := carg0@5:bv
        185 |->   *(185, 0x0:[64]) := 0x5:[32]
-- snip --
  Base memory
    Allocations:
      HeapAlloc 2 0x100000:[64] Mutable 16-byte-aligned grease setup (arg5)
      GlobalAlloc 1 0xffffffffffffffff:[64] Mutable 1-byte-aligned Global memory for macaw-symbolic
    Writes:
      Indexed chunk:
        1 |->   *(1, 0x0:[64]) := cglobalMemoryBytes@1:a
        2 |->   *(2, 0xfffc8:[64]) := (2, 0xfffe4:[64])
                *(2, 0xfffd0:[64]) := (2, 0xffff0:[64])
                *(2, 0xfffd8:[64]) := (1, 0x401163:[64])
                *(2, 0xfffe0:[64]) := 0x5:[32]
                *(2, 0xfffe8:[64]) := 0x1:[32]
                *(2, 0xffff0:[64]) := 0x0:[64]
                *(2, 0xffff8:[64]) := [0x20:[8]
                ,0x10:[8]
                ,0x40:[8]
                ,0x0:[8]
                ,0x0:[8]
                ,0x0:[8]
                ,0x0:[8]
                ,0x0:[8]]

To understand this printout, users have to know:

About the "two-dimensional" representation of pointers mentioned above, in particular
- The "writes" are linked to the "allocations" via the block numbers
- How many bytes a pointer conceptually occupies when written
What a "branch frame" means
How to read each "write":
- What is an "indexed chunk"?
- How do you know how big a write was?
How to read concrete Crucible values, e.g., the write *(2, 0xffff8:[64]) := [0x20:[8], ..., 0x0:[8]] has the same meaning as *(2, 0xffff8:[64]) := 0x0000000000400102:[64]

Most importantly, to understand what is live at each address in the final state of memory, one actually has to walk over the writes (in their brain), collecting writes and overwriting previous ones. An alternative representation could perform this "collection" in advance, presenting just the "final picture" of what exists in memory, much like a core dump would.

Excellent, this is a fantastic problem statement. I like the idea of working to make the output more like a core dump.

Some various thoughts, which may or may not help inform a design for a simplified memory representation:

The assumption that allocations at different block numbers are completely disjoint is pretty deeply baked into crucible-llvm, and I fear that trying to map block numbers to actual addresses would be rife with complexity. I propose that we keep the notion of block numbers for now. They do require users to incur some conceptual overhead, but I'm not sure if it's that much more confusing that mapping block numbers to random addresses.
I certainly agree that the presentation of writes is far more confusing than it ought to be. I would love if crucible-llvm could present a "flat" view of each block's written values, perhaps in order of the addresses of each written values' offsets, e.g.,
```
Block number     Start offset    End offset    Size    Value
         1              0x0          0x10    0x10       42
         1             0x20          0x40    0x20       27
         2              0x0          0x10    0x10       45
...
```
This looks closer to what a core dump would give you.
I am unclear on what the advantage would be if we made this simplified memory representation use fixed-size arrays for storage or whether we could index into it efficiently. Perhaps I'm missing something, but this representation would solely be used for displaying information to the user, so the main use case is just pretty-printing the entire memory representation all at once, correct? Or do you envision printing smaller subsets of the memory representation depending on the context?

Thank you for the thoughts!

in order of the addresses of each written values' offsets

Ah, but this might hide the order in which the writes occur, making it harder to understand which of two overlapping writes is "live".

I am unclear on what the advantage would be if we made this simplified memory representation use fixed-size arrays for storage or whether we could index into it efficiently.

Yeah, so the three uses for this data structure are probably:

Pretty-printing it to the terminal
Serializing it (e.g., as JSON) and analyzing or displaying it via another program
(Possibly modifying it and) re-symbolizing it and beginning symbolic execution from a concrete memory state, like #1211

None of these are super likely to be hot loops, but I think we should still be a little concerned with the performance of constructing concretized memory. This is because concretization has to happen all at once inside a callback that receives the What4 SatResult. It might not be clear at that point whether the concretized memory is needed, making it necessary to always concretize memory and return it further up the stack. This is true in my use-case.

The reason I mention using fixed-size, mutable arrays is because when turning the log-of-writes into a concretized memory, we have to traverse the writes, and apply an equivalent write to the concrete memory. So if a write has a non-zero offset, we'll be seeking/indexing into the concrete allocation in order to perform the write. Using mutable arrays where each entry represents a byte is likely the most performant way to accomplish this. In contrast, using a representation where each entry is not byte-size would incur O(n) overhead to seek to the place where a write should occur (because you would have to traverse the whole prefix of the array, counting the number of bytes you've passed over).

Ah, but this might hide the order in which the writes occur, making it harder to understand which of two overlapping writes is "live".

Yes, this is true. But isn't your goal to produce the "final picture" of the memory state? In that case, I would expect that you would only want to display the most recent information vis-à-vis overlapping writes.

None of these are super likely to be hot loops, but I think we should still be a little concerned with the performance of constructing concretized memory.

A fair point. If there are ways to make constructing memory a little faster (e.g., by using fixed-size arrays in places that support them), then I'm all for it. That being said, I think that we'll inevitably need some amount of indirections, both for things like unbounded allocations as well as the fact that writes may not occupy contiguous offsets in an allocation region.

In that case, I would expect that you would only want to display the most recent information vis-à-vis overlapping writes.

Not to get derailed from the main thrust of the discussion, but I'm not sure that this is possible in the format you described. Consider an allocation of size 4 with two writes: one of three 0x00s at offset 1, and another (later) of two 0x01s at offset 0. Thus, the final state is [0x01, 0x01, 0x00, 0x00] Sorting these by offset would potentially be misleading, as it would perhaps suggest that the allocation contains [0x01, 0x00, 0x00, 0x00]. However, neither of these writes completely overshadows the other, so they would either have to both be displayed, or somehow "merged" (which is basically what I'm proposing). This kind of confusion is exactly what I'm hoping to avoid!

However, neither of these writes completely overshadows the other, so they would either have to both be displayed, or somehow "merged" (which is basically what I'm proposing).

Yes, I agree that merging the writes is the least surprising way to present that information. I suppose this means that there will be some amount of cleverness required in implementing the merging logic, but hopefully less cleverness than would be required to map different block numbers onto the same address space.

GaloisInc / crucible