Thoughts on Theorem Prover Design

DavePearce commented 7 years ago

Improving the overall design of the theorem prover is a constant goal. The current implementation has proved itself to be reasonably good. Certainly, it is much faster, more reliable and more complete than the previous (WyRL based) implementation. And yet, at the same time, it suffers a number of problems:

Messy Overall Design. The overall layout of the code still seems pretty messy and incoherent at times. This makes it hard to fix problems, and also hard to know when things are "done right".
Many Minor Problems. There are many minor bugs which have been encountered. These are particularly related to quantifier instantiation and congruence closure.
Recursive Predicates / Invariants. There is no real story for handling these.

Overall, the key challenge seems to be finding a design which is simple, coherent and which obviously works (at least with enough effort). Of course, this is something of a holy grail.

Interpreter Model

Writing the interpreter for the counterexample generation was quite interesting. Whilst interpreters are reasonably complex creatures, they have some interesting properties. In particular, with enough effort, they can be made to work extremely reliably. The current theorem prover design lacks coherence around the various rewrite rules provided. Specifically, it makes these questions difficult:

What is the complete set of rules?
When is a rule really finished?
What's the different between axiom instantiations, case analyses, simplifications and other kinds of rule?
What set of operations on data values should be supported?

The interpreter model helps here. For example, axioms are encoded within the interpreter itself as check methods. Furthermore, we have a defined set of operations on abstract values which can be performed. Finally, we can also test the interpreter using a "concrete semantics" to help root out bugs in the interpreter itself.

The rough design of the interpreter model is something like this:

(Values) Instances of AbstractSemantics.RValue are proxies for underlying "abstract values". An abstract value has a shape which may be partially filled out. Congruence of abstract values is exactly that: they are merged together which may or may not be successful.
(States) There is a global set of "states" which are a set of known truths at a given point. State are split on certain operations (e.g. on if-conditionals).
(Quantifiers) These are initially handled using heuristics (e.g. looking for triggers). But, ultimately, they can be handled by raw enumeration if necessary (which is essentially what counterexample generation already does).

This obviously needs a lot more thought!!

DavePearce commented 7 years ago

As an example, let's consider abstractly interpreting this:

assert(int x):
    if:
        x > 0
    then:
        x >= 0

For proof-by-contradiction, we end up with this:

assert(int x):    
    x > 0 && x <= 0

The initial internal state s is {x->v0; v0->?} on which we perform these operations:

s.assert(x.greaterThan(Value.ZERO))
s.assert(x.lessThanOrEqual(Value.ZERO))

At this point, the state becomes unreachable and looks like {x->v0; v0->?; 1 <= v0 <= 0}.

DavePearce commented 7 years ago

As another example, consider abstractly interpreting this:

assert(int x, int y):
    if:
        x == y
        y > 0
    then:
        x >= 0

For proof-by-contradiction, we end up with this:

assert(int x, int y):    
    x == y && y > 0 && x <= 0

The initial internal state s is {x->v0, y->v1; v0->?, v1->?} on which we perform these operations:

s.assert(x.equal(y))
s.assert(y.greaterThan(Value.ZERO))
s.assert(x.lessThanOrEqual(Value.ZERO))

The state now looks like {x->v0, y->v0; v0->?; 1 <= v0 <= 0}. Observe here that abstract values v0 and v1 were merged, resulting in only one remaining.

DavePearce commented 7 years ago

As another interesting example, consider this (after transform for proof-by-contradiction):

assert(int[] xs, int x, int n):
    n == 2 && n == |xs| && xs = [x,2,3]

The initial state would be:

{xs->[v0; 0:=v1,1:=2,2:=3]; x->v1, n->v2; v0->3, v1->?, v2->2}

Here, v0 represents the abstract length of xs. After evaluating the above we would have:

{xs->[v0; 0:=x,1:=2,2:=3]; x->v1, n->v0; v0->_|_, v1->?}

Observe that v0 maps to _|_ signalling that it's value is in contradiction. Essentially, after n == 2 is evaluated we have v0 and v2 being merged, which results in _|_.

DavePearce commented 7 years ago

Next we consider how a disjunction (i.e. case split) is handled. Consider this:

assert(int x):
    x < 0 || x > 0

The initial state s is simply {x->v0; v0->?}

To evaluate the above, we do the following commands:

s1,s2 = s.split()
s1.assert(x.lessThan(Value.ZERO));
s2.assert(x.greaterThanOrEqual(Value.ZERO).not());
s2.assert(x.greaterThan(Value.ZERO));

It's not completely clear to me how the interpreter handles this.

The main issue in the above is to figure out how to "evaluate" both sides of a conditional. For example, this is the current implementation used for counterexample generation:

Result r = evaluateStatement(stmt.getIfBody(),environment);
if(r.holds()) {
    return evaluateStatement(stmt.getThenBody(),environment);
} else {
    return new Result(environment,true);
}

But, this is clearly not evaluating both sides.

DavePearce commented 7 years ago

An interesting question is how quantifier instantiation is handled (and esp. triggerless quantifier instantiation). For example, this:

assert(int[] a):
        forall(int k).((k >= 0 && k < |a|) ==> a[k] >= 0)
        |a| > 0 && a[0] < 0

What does the initial state even look like? Perhaps this:

{a->v0;
  int[] v0->[v1;0->v2], int v1->?, int v2->?;
  v1 > 0 && v2 < 0
  forall(int k).((k >= 0 && k < v1) ==> v0[k] >= 0)
}

The quantifier can then comfortably range over the ground terms with matching type (i.e. 0, v1 and v2) which gives the contradiction for 0.

OBSERVATION: in the above, we have inequality graphs and "generators" (i.e. quantifiers). Generators are much like their counterparts in counterexample generation. Except that we probably start with the obvious likely candidates first rather than jumping straight into brute force enumeration.

OBSERVATION: Case splits are a way of reducing the space of a generator. For example, in forall(int k).([1,2][k] >= 0) we immediately narrow the range of k to 1..2. The interesting question is how this narrowing operation is actually implemented.

Whiley / WhileyTheoremProver

Thoughts on Theorem Prover Design #139

Interpreter Model