draperlaboratory / cbat_tools

Program analysis tools developed at Draper on the CBAT project.
MIT License
101 stars 14 forks source link

Using primus lisp as a configuration language for WP #141

Open codyroux opened 4 years ago

codyroux commented 4 years ago

Currently, we have few ways of allowing fine-grained control to the user for the WP analysis. This is in contrast to the programatic API, which provides hooks to control:

Most of this is hidden to the user, with the notable exception of the postcondition (and some precondition stuff now) which can be entered as a string in the SMT-LIB2 format, which is massaged using regexps and passed directly to Z3 for processing.

An obvious idea is to replace this with a DSL which gives the user extensible control over these options. An obvious language for such a DSL might be Lisp, since it requires relatively little infrastructure within the host language. Since Primus Lisp already exists, it seems like a reasonable thing to try to re-use the existing infrastructure to implement these desiderata.

Here are some usage examples, to help fix ideas and notations:

As a simple example, with a single binary, we could imagine the following invocation:

bap a.out --pass=wp --wp-config=my_config.lisp

Where my_config.lisp might be the following:

(wp-update-post
     (assert (= init_RAX (+ RAX 1))))

For checking non-null dereferences in expressions, one might write

(wp-update-expr-hook
   (on-mem-read (mem loc)
    (assert (not (= (init loc) 0))))

Where on-mem-read adds a hook which matches memory reads in the expression, and takes the memory and read values as argument to the hook, and init var returns the value before the expression is evaluated.

For function summaries, we have a similar story, for example one would ideally like something like:

(defun malloc-spec (env)
     (fun-assumes (> RDI 0))
     (fun-ensures (or (= RAX 0) (> RAX HEAP_MIN)))

(wp-update-fun-spec
     "malloc"
      malloc-spec)

Or something similar. This isn't entirely trivial, since because we do WP, we work backwards from a postcondition, but something similar can work. We can start by offering exactly the ocaml interface currently provided:

https://github.com/draperlaboratory/cbat_tools/blob/83459fc157d3162eab0d326186c28902cef68d53/wp/lib/bap_wp/src/environment.mli#L45

In the long term, one could expect a fair amount of configuration to happen this way, for example the environment set up, and generally all the current user flags might be configurable with this language, which would simply require adding hooks to the method.

From an engineering perspective, it's not obvious how to wrangle the current Primus-Lisp execution environment to do this. However there are two main issues with this:

  1. Primus Lisp currently only operates on concrete values
  2. Primus Lisp executes entirely within a Machine.t monad

For 1, one might create a machine which watches the evaluation of a machine instruction, and hook into the events fired by the relative observations. Alternately, one could abuse the lisp macro system to manipulate the S-expressions into something we can directly pass to one of our OCaml constraint builders or other.

For 2, I would imagine that there is some Obj.magic trickery that we could exploit, despite the general desire to avoid such things. The alternative seems to be wrapping the whole main execution of the wp pass into a Machine.t though.

ivg commented 4 years ago

I can propose two options for that:

Of course, the set of options is not discrete and is not limited to the two options specified above. We can certainly end up enacting a mixture of options A and B or choose a completely different option.

Let's do some cons and pros analysis so that we can analyze the risks and benefits of the choice.

Option A

The plan is to switch to run Primus computations and the extension points to produce summaries of the statement semantics and then use those summaries to compute the weakest preconditions.

Cons

Pros

The fact that Primus runs forwards but WP backwards is not very important, as you still consider def terms (or statements in general) in a forward manner, and then combine them. So you can use Primus on per expression/statement level and then summarize the effects of that statement into a weakest precondition. You can use the new symbolic executor to keep inputs symbolic and your postcondition will serve as a constraint to the symbolic solver, e.g., suppose we have a term

RAX := mem[RBX + RDI,el]

If you will apply Primus on this term together with its symbolic executor, you will get both the concrete state (numbers are honestly rolled on a dice1):

RDI -> 2
RBX -> 5
RAX -> 6
mem[7] -> 6

and the corresponding symbolic formula

RDI -> R_RDI
RBX -> R_RBX
(RBX+RDI) -> (bvadd R_RDI R_RBX)
mem[RBX+RDI] ->  (select mem (bvadd R_RDI R_RBX))
RAX -> (select mem (bvadd R_RDI R_RBX))

Note: right now we do not support symbolic memory, but, of course, it could be added. You can then add the weakest precondition of the next statement to the computed formulae so that they can be simplified or even solved.

The same approach could be extended to calls to stubbed functions. If we will think of call @memcpy as a big statement that does a lot of reads and loads, we can still compute the weakest precondition.

So far so good, but there is one caveat. The main thing that you can't take away from Primus is the concreteness of values. For each expression in the given context, Primus will ask you for concrete values of that expression. You can specify more than one of course, but Primus needs at least one. Moreover, each generated formula will be wrt some valuation. This is both the boon and the bane. For example, when you will use malloc or memcpy stub, written in Primus Lisp, it will generate a formula that will be valid for the set of concrete inputs that were chosen, based on some concretization strategy. On one hand, it automatically serves the problem with loops (at least in the stubs), on the other hand, and steals control and under-approximates the real program. But, again, for programs without loos (for BIR terms in particular) you can safely use Primus and reuse all of its components. E.g., you can use Primus Lisp for specifying the preconditions, initializing the state with concrete values, etc.

Summary

This is a risky option that at the same time have a big potential (or not, hence the risk). It may require substantial rethinking and rewriting of WP so that even rewriting it from scratch would be considered a viable option. On the other hand, if it will work, then it could be a solution to the scalability problem.

Option B

The plan is to extract the Primus independent part of Primus Lisp, rename it into BAP Lisp, and employ it without any dependencies on Primus.

Cons

Pros

Notes

Extracting Lisp from Primus is totally in vain with our long term plans. And in the future, we plan to use BAP Lisp for many different tasks, such as writing lifters, specifying summaries for abstract interpreters and model checkers, and even using it is the main media for storing program semantics information, i.e., for storing it on disk and exchanging information between analyses, probably running in different environments and implemented in different programming languages. In other words, we plan to use BAP Lisp as our lingua franca. Of course, for that BAP Lisp shall have a pretty well-defined semantics, so that different interpretations of BAP Lisp mean the same thing. Right now, the semantics is not specified and is provided by the reference implementation.

Given such ambitious goals, we have to be very careful when we will use BAP Lisp for WP and do not accidentally change its semantics. In other words, we should always keep in mind that this is not just a lisp for WP configuration, but BAP Lisp with a concrete semantics. This will enable the reuse of the existing Lisp code now and in the future.

For example, this is not possible

(wp-update-fun-spec
     "malloc"
      malloc-spec)

Instead, in BAP/Primus Lisp we use the notion of signals and methods, e.g.,

(defmethod wp:update-fun-spec "malloc" (size)
  (if (symbolic-is-concrete size)
      (malloc size)
    (symbolic-malloc size)))
;; but oops, this won't work, as all methods return unit
;; we can of course pass a reference (a symbol) to the method, but maybe there is a better solution

In addition, BAP Lisp encourages overloading for the composable extension of program semantics. Each function or signal may have multiple definitions. And depending on the application context, we will choose the most specific definition (in case of methods we will choose a set of most specific methods). We represent contexts as partially ordered type classes (here context should be understood in terms of Haskell type classes). For example,

(defun symbolic-malloc (size)
  (declare (external "malloc") 
           (context symbolic))
  (if (symbolic-is-concrete size)
      (bap:malloc size)
    (symbolic-assume (>= 0 size)
                     (< *malloc-max-chunk* size))
    (symbolic-allocate-memory size)))

This is just an example to give you the flavor of Primus Lisp. Most likely, you won't need this overloading as the current stub of malloc will most certainly work for you, if you will implement the corresponding Lisp primitives, e.g., for malloc you need the memory-allocate PTR LEN primitive, that marks the region PTR, PTR+LEN as allocated. In Primus this primitive is implemented with the Primus.Memory library, which is concrete, so you have to provide the same implementation that will provide symbolic memory.

And I envision a lot of code duplication between symbolic and non-symbolic implementations of Memory, Env, Linker, and Interpreter libraries (which are the main libraries of Primus). We can, of course, address this in the future and factor out those commonalities and move the to the BAP Lisp libraries. It shouldn't be very hard, as those components are already generic (they are all parameterized with the Primus.Machine.S monad), so the only difference would be the abstraction of the Value.

Summary

This is the option of the least risk but minimal benefits. You don't need to break anything in WP and can start right now. We can even extract BAP Lisp into a separate library that is independent of BAP release cycle (so that the BAP Lisp interface could be changed between releases and we won't accumulate a technical debt of bad decisions).


1)Joke :)