Whiley / Whiley2Boogie

A compiler backend for translating Whiley programs into Boogie programs for verification.
Apache License 2.0
1 stars 0 forks source link

Preserving loop invariant HEAP locations #60

Open DavePearce opened 3 years ago

DavePearce commented 3 years ago

There is a problem related to preservation of HEAP locations through loops. The following illustrates:

type Ref is (&int r)

method looper(Ref[] rs, Ref r) -> Ref:
    int i = 0
    while i < |rs| where i >= 0:
        rs[i] = new 1
        i = i + 1
    //
    return r

The problem is that this does not preserve the value of r through the loop, even though this is actually unchanged. This arises because HEAP is actually assigned within the loop. To preserve this, we need to generate type invariants for all local variables which contain heap references of any kind. Even that might not be enough if we try to reason about locations which are no longer accessible?

utting commented 3 years ago

Dynamic frames would help to solve this. The 'frame' of this method could be written as Delta(empty), since it only allocates new cells, and never changes existing heap cells. Then we would know that 'r' is unchanged.

But the question remains how to associate a dynamic frame spec (which relates before/after states) with a loop invariant. One simple approach would be for the 'after' state to be the one wherever the frame spec is written (e.g. in the loop invariant) and the 'before' state to always be the initial state of the whole method.

Cheers Mark

On Tue, 19 Jan 2021 at 08:44, David Pearce notifications@github.com wrote:

There is a problem related to preservation of HEAP locations through loops. The following illustrates:

type Ref is (&int r)

method looper(Ref[] rs, Ref r) -> Ref: int i = 0 while i < |rs| where i >= 0: rs[i] = new 1 i = i + 1 // return r

The problem is that this does not preserve the value of r through the loop, even though this is actually unchanged. This arises because HEAP is actually assigned within the loop. To preserve this, we need to generate type invariants for all local variables which contain heap references of any kind.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/Whiley/Whiley2Boogie/issues/60, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACLQHDV4GBO3FVIVQ45622LS2S2V3ANCNFSM4WH5M3KQ .

DavePearce commented 3 years ago

But the question remains how to associate a dynamic frame spec (which relates before/after states) with a loop invariant.

Yeah, and this is really the key question. I think my proposed solution will work actually, as I think reasoning about locations which are not accessible would be impossible. Could be wrong about that though. Eitherway, this is definitely a case I hadn't considered before!

DavePearce commented 3 years ago

A better solution is actually to reuse the default framing developed for methods. Specifically, we can invalidate any location reachable from a modified variable in a loop and keep everything else the same.

DavePearce commented 3 years ago

Right, so I'm now starting to understand this is not as easy as I thought. We cannot apply default framing for loops just for modified variables. For example, the following illustrates:

&int x = new 1
while ...:
    &int y = x
    *y = 1

In this case, y is not a modified variable even though it does modify the contents of state outside the loop. A few observations:

1) A naive approach is to apply default framing for all local variables. This is not ideal. 2) We could restrict (1) to just those local variables which are used within the loop. I believe this is safe and offers some improvement over (1). 3) We could also flatten the loop body and present the full flow information to the verifier. Whilst this would work fine for the Whiley theorem prover, it is problematic for Boogie since the whole point of what we're doing here is to avoid verification condition generation.

To be honest, it's a bit awkward. I suppose that (2) combined with the improved framing from the proposed RFC would offer a good solution, but it would be sub-optimal compared with using the Whiley Theorem Prover.

UPDATE: also I should be using the old() syntax that Boogie supports.

DavePearce commented 3 years ago

Looks like Old() syntax has no useful meaning in a loop invariant? That's annoying!

DavePearce commented 3 years ago

Right, have implemented the best version of default framing for loops that I can. Needs more work, for sure. For example, the following will not verify:

method main():
    int i = 0
    &int q = new 1
    &int p = new 2
    //
    while i < 10:
        int x = *p
        *q = 1
        i = i + 1
    //
    assert *p == 2

The reason for this is that p is included in the loop frame.

DavePearce commented 3 years ago

So, now I'm a little concerned as to what framing should mean for loops in Whiley. For example, consider this:

method main():
    int i = 0
    &int p = new 2
    //
    while i < 10 where 'p <= *p :
        int x = *p
        i = i + 1
    //
    assert *p == 2

Does 'p refer to the value stored at p on entry to the loop? What happens then if we have nested loops:

while ... where 'p <= *p :
    while ... where 'p <= *p :

What now? Is there any way I can refer to the value stored at p on entry to the first loop from within the loop invariant of the second loop? I can achieve this with a ghost variable at least.

DavePearce commented 3 years ago

I need to update this RFC: https://github.com/Whiley/RFCs/blob/master/text/0067-old-values.md

I think we need to separate out two things:

  1. The value of a variable in the pre-state
  2. The value of the heap in the pre-state

Currently, the proposal allows an arbitrary "old expression" form '*e' with an eye to eventually allowing 'e'. I think there are some ambiguities here though. What is the meaning of these forms?

This is not making sense. We want two expression forms:

Now, we can disambiguate the last case above as either ##v or #*v.

What about method calls? Do we need a syntax like m#() or similar?

DavePearce commented 3 years ago

Currently, this doesn't yet verify for reasons unknown:

type Actor is &{int data}

method createActors(int n) -> (Actor[] r)
requires n >= 0:
    Actor[] row = [new {data:0}; n]
    int j = 1
    while j < n where j >= 0 && |row| == n:
        row[j] = new {data: j}
        j = j + 1
    return row

In contrast, this variation also fails (but only for the postcondition):

type Actor is &{int data}

method createActors(int n, Actor p) -> (Actor[] r)
requires n >= 0:
    Actor[] row = [new {data:0}; n]
    int j = 1
    while j < n where j >= 0 && |row| == n:
        row[j] = p
        j = j + 1
    return row
DavePearce commented 3 years ago

Right, this variation still fails (in loop invariant):

type Actor is &int

method createActors(int n)
requires n >= 0:
    Actor[] row = [new 0; n]
    int j = 1
    while j < n where j >= 0 && |row| == n:
        row[j] = new 1
        j = j + 1
    //

However, this one passes:

type Actor is &int

method createActors(int n)
requires n >= 0:
    Actor[] row = [new 0; n]
    Actor p = new 1
    int j = 1
    while j < n where j >= 0 && |row| == n:
        row[j] = p
        j = j + 1
    //
DavePearce commented 3 years ago

This would appear to be a minimal example:

method createActors(int n)
requires n >= 0:
    &int p = new 0
    int j = 1
    while j < n where j >= 0:
        p = new 1

The problem is that Boogie believes p can refer to anything after the loop, including an arbitrary location which existed before the method. In essence, the invariant we want says "p is disjoint with anything which existed before the method".

utting commented 3 years ago

(Above example needs j = j + 1, so that the loop terminates.)

DavePearce wrote:

Looks like Old() syntax has no useful meaning in a loop invariant?

I think it would be useful to have Old always refer to the method entry state. Then a loop invariant can easily be accumulating more and more changes between its current state and the initial method entry state.

DavePearce commented 3 years ago

I think it would be useful to have Old always refer to the method entry state.

After playing around, I think this is what Boogie actually does. But, I personally don't think this makes any sense. At least, I can't think of good use cases where this semantic makes more sense than having old() refer to the heap on entry to the loop.

The problem arises from Hoare logic. That is, variables modified in a loop are havoced at the beginning of the loop. In this case, it is the variable HEAP being modified. We need the old() syntax to give us an anchor point to say what in HEAP changes and what doesn't change across the loop. So, requiring old() to refer to the entry of the method would mean it cannot help with programs containing multiple loops. Let me think of a contrived example:

type Counter is &int

method apply(Counter[] counters, nat m, nat n)
requires |counters| > 0 && m < |counters| && n < |counters|:
   // Reset the counters
   for i in 0 .. m:
       Counter ith = counters[i]
       *ith = *ith + 1
   // Inccrement the counters
   for i in 0 .. n:
       Counter ith = counters[i]
       *ith = *ith - 1
   // assert something about them
   assert (n > 0 && m > 0) ==> *(counters[0]) == (m+n)

This example is a bit weird, and we could actually merge the loops. But, the key is what loop invariants we need. For the second loop, we need to say every counter upto n is one less than the value it had on entry to the loop. They key is that we're not interested in talking about the relationship to its value on entry, but to its value at the beginning of the loop.

Of course, the two approaches to old() we're discussing are equivalent in some sense. That is, you can always simulate both using ghost variables. Its more a question of which is more convenient in practice.

Anyway, maybe that helps explain where I'm coming from.

DavePearce commented 3 years ago

Here's another example (perhaps less contrived this time):

method create(int n, int m) -> Counter[]:
   // initialise all counters to m
   Counter[] counters = [new m; n]
   // increment some counters by one
   for i in 0..|counters|:
       Counter ith = counters[i]
       *ith = *ith + 1

This is an interesting one because we cannot use old() when expressing the loop invariant if it refers to the heap on entry to the method. That's because the locations of interest did not exist before the method. However, if old() refers to the heap on entry to the loop, then it's fine.

utting commented 3 years ago

I see that Dafny introduces a new local variable prevHeap just before each while / foreach loop, and uses this to express the loop invariant (see last sentence on page 28 of KRML 190: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/12/krml190.pdf).

It is also interesting that it uses this prevHeap variable to help frame the loop invariant - with a 'free invariant' saying that the current heap equals prevHeap apart from the modifies variables.

DavePearce commented 3 years ago

@Utting That's a good find, and roughly corresponds with what I expect. Phew this is all so icky!! Basically, Boogie was designed for Dafny and really falls short of being a general purpose IVL.

DavePearce commented 3 years ago

This issue is on going. This is my latest puzzle:

type Node is &{ List next, int data }
type List is null | Node

method next(Node n) -> (List l):
    return n->next

method visit(List s):
    //
    while (s is Node):
        s = next(s)

The problem here has got something to do with the different heaps involved. The specification for next() is in terms of the HEAP before and after the invocation, whilst the inferred invariant for the while loop is in terms of the HEAP from before the loop. The loop looks roughly like this:

HEAP$old = HEAP
while(s is Node) 
invariant ...
invariant forall(r:Ref :: List#frame(s,r,HEAP$old) || HEAP$old[r] == Void || HEAP$old[r] == HEAP[r]) {
   ...
}

And the relevant specification for next() is:

procedure next(HEAP: [Ref]Any, n : Node) returns (HEAP$new : [Ref]Any) 
requires ...
ensures ...
ensures forall(r:Ref :: Node#frame(s,r,HEAP) || HEAP[r] == Void || HEAP[r] == HEAP$new[r]) {
   ...
}

So, we have a problem because Node#frame(r,HEAP) does not immediately imply List#frame(r,HEAP$old).

DavePearce commented 3 years ago

FINALLY ... a breakthru !!

Minimal version of failing example is:

type Node is &{ List next, int data }
type List is null | Node

method next(List n):
    skip

method visit(List s):
    //
    next(s)
    next(s)    

The above is fine when there is only one call to next(). I think we can represent the problem like so:

(A*)       (B)       (C)

+-+       +-+       +-+
|+------->| |       | |
+-+       +-+       +-+

    ||           ||
    ||  (next)   ||
    \/           \/

+-+       +-+       +-+
|+|       | |       | |
+|+       +-+       +-+
 |                   ^
 +-------------------+

Let A be the root of the cone, B some data within the cone and C some arbitrary data outside the code (i.e. which shouldn't be modifiable by next()). The key issue is that, after one call to next() has magically put C into the cone of A.

Conclusion: need to say something about cone of A after the invocation (and possibly B ?).

The cone of every reference from the original cone remains disjoint with everything else.

for every reference p in cone(A) and q not in cone(A), follows that q not in cone(p)

DavePearce commented 3 years ago

Returning now to the earlier puzzle:

method visit(List s):
    //
    while (s is Node):
        s = next(s)

The current issue with the translator is that s is used for the loop invariant to bound what references can be changed. But, it is assigned itself within the loop and, thus, this devalues the loop invariant. To resolve this, we need to copy the variable outside the loop into a temporary which is loop invariant.

utting commented 3 years ago

I like your new generic heap framing functions: type Any; function Any#within(HEAP : [Ref]Any, p : Ref, q : Any) returns (bool) { (Ref#is(q) && Ref#within(HEAP,p,Ref#unbox(q))) || (Array#is(q) && Array#within(HEAP,p,Array#unbox(q))) || (Record#is(q) && Record#within(HEAP,p,Record#unbox(q))) }

function Array#within(HEAP : [Ref]Any, p : Ref, q : [int]Any) returns (bool) { (exists i:int :: Any#within(HEAP,p,q[i])) }

function Record#within(HEAP : [Ref]Any, p : Ref, q : [Field]Any) returns (bool) { (exists f:Field :: Any#within(HEAP,p,q[f])) }

function Ref#within(HEAP : [Ref]Any, p : Ref, q : Ref) returns (bool) { (p == q) || Any#within(HEAP,p,HEAP[q]) }

  1. It might be useful to add an axiom to say that Void can never be within a heap? e.g. axiom (forall HEAP : [Ref]Any, p : Ref :: ! Any#within(HEAP, p, Void); This might prevent Boogie trying to prove/disprove 'within' facts about array lookups where i is not in the bounds of the array (and record lookups for non-existing fields).

  2. The purpose of the last function is less clear to me. It is true iff p is an alias for q or anything in the cone of q? But it is not symmetric, in that it doesn't check if q is in the cone of p (except for when p==q). Is that the intention?

utting commented 3 years ago

Could we use this to add a 'changes_only' predicate that can be used in postconditions? (and maybe loops, somehow?) That is, changes_only(x) would expand into something like:

  changes_only(x, OLDHEAP, NEWHEAP) ==> (forall r:Ref :: !Any#within(OLDHEAP, r, x) ==> OLDHEAP[r] = NEWHEAP[r])

In practice, we'd want a family of changes_only(x,y,z,...) for a limited number of variables.

DavePearce commented 3 years ago

This might prevent Boogie trying to prove/disprove 'within' facts about array lookups where i is not in the bounds of the array (and record lookups for non-existing fields).

Potentially, thinking about Void might help clear up some problems I'm having with Boogie and proving facts about within(). Definitely worth exploring!

DavePearce commented 3 years ago

In practice, we'd want a family of changes_only(x,y,z,...)

Yeah, I guess that would be the issue. Currently they are just inlined. But, I suppose it would make things easier to read and debug! Your definition needs to account for allocation as well. So, I think it would be:

changes_only(x, OLDHEAP, NEWHEAP) ==> (forall r:Ref :: !Any#within(OLDHEAP, r, x) ==> (OLDHEAP[r] == Void) || (OLDHEAP[r] = NEWHEAP[r]))