boogie-org / boogie

Boogie
http://research.microsoft.com/en-us/projects/boogie/
MIT License
506 stars 110 forks source link

Irreducible control flow graphs #921

Open petemud opened 2 months ago

petemud commented 2 months ago
procedure unsound() {
    var x: int;
    assume(x == 0);
    if (true) {
        goto Inside;
    }
    while (x < 10000) {
        Inside: x := x + 1;
    }
    assert(x == -1);
}
$ boogie unsound.bpl /extractLoops
Boogie program verifier finished with 1 verified, 0 errors

History of commits

  1. d8b239c0c3bd5d33f683eafb735aef41dac28760 @shazqadeer added code to handle irreducible graphs
  2. 28d46f807724f88d77b113200cfb769cb76fa28b @shazqadeer reverted it on the same day
  3. 64b57a073afe03ab57d99c96ef6994b77f786d0e @akashlal Support for irreducible graphs (with extractLoops)
  4. b6796ce965fd1a973e7df19a3717be9187314053 @shazqadeer finally purged his initial code away from repo

So now the only surviving support for irreducible flow graphs I found is undocumented /extractLoops option

Problem with /extractLoops

Let's say, we have the canonical nonreducible flow graph

graph LR
    A --> B & C
    B --> C
    C --> B
procedure nonreducible() {
    A: goto B, C;
    B: goto C;
    C: goto B;
}

According to the Dragon Book (9.7.6 Handling Nonreducible Flow Graphs) this flow graph should be transformed into one of

graph LR
    A --> B' & C
    B & B' --> C
    C --> B
graph LR
    A --> B & C'
    B --> C
    C & C' --> B
procedure reducible1() {
    A: goto B', C;
    B': goto C;
    B: goto C;
    C: goto B;
}

procedure reducible2() {
    A: goto B, C';
    C': goto B;
    B: goto C;
    C: goto B;
}

Meanwhile /extractLoops just unrolls loop and produces control flow which is not equivalent to original

graph LR
    A --> B & C
    B --> C
    C --> B'1
    B'1 --> C'1
    C'1 --> B'2
    B'2 --> ...
procedure reducible() {
    A: goto B, C;
    B: goto C;
    C: goto B'1;
    B'1: goto C'1;
    C'1: goto B'2;
    B'2: goto C'2;
    ...
}

Real-world example

https://github.com/open-s4c/libvsync/blob/main/include/vsync/atomic/internal/arm64.h#L551-L564

graph TB
    A[ldx oldv, a
cmp oldv, cmpv]
    B[b.eq oldv]
    W[wfe]
    C[ldx oldv, a
cmp oldv, cmpv]
    D[b.neq oldv]
    E[add new, oldv, v
stx tmp, new, a]
    F[cbnz tmp]
    A --> B
    B --> W & E
    W --> C
    C --> D
    D --> W & E
    E --> F
    F --> C
procedure await_eq_add() {
    A: goto B;
    B: goto W, E;
    W: goto C;
    C: goto D;
    D: goto W, E;
    E: goto F;
    F: goto C, G;
    G: return;
}

Conclusions

Are there any unsolvable problems with properly handling irreducible flow graphs, that it was removed by @shazqadeer 28d46f807724f88d77b113200cfb769cb76fa28b? Can it be brought back or are there other ways of dealing with this issue in Boogie?

Quotes

We use the standard techniques for converting an irreducible graph into an equivalent, although possibly far larger, reducible graph. We are looking into ways to deal with irreducible graphs that avoid this problem, but so far it has not been an issue. Barnett, M., Leino, K.R.M. (2005). Weakest-precondition of unstructured programs.

if (!g.Reducible)
{
    throw new VCGenException("Irreducible flow graphs are unsupported.");
}

Barnett, M. (2009). Initial set of files.

shazqadeer commented 1 month ago

Thanks for bringing this issue to our attention. I do not remember now why support for them was dropped. Irreducible graphs have not surfaced before in Boogie applications so we never considered adding support for them. Do you have any concrete application where you need them?

petemud commented 1 month ago

Do you have any concrete application?

Yes. See "Real-world example" from above. I'm using boogie to prove properties of libvsync atomics, which are implemented in ARM assembly

shazqadeer commented 1 month ago

ok, I will look into it. But it will take me some time. Do you think you can make progress by transforming to irreducible CFGs on your side? I assume you have a toolchain for generating Boogie and you can do the transformation in that toolchain.

petemud commented 1 month ago

I don't actually have any toolchain. The project is small. Part of it is hand-written Boogie model for ARM and part is a parser from assembly. I use Boogie precisely because I need to deal with control flow. I was first doing everything in Coq and then SMT-LIB directly, but there dealing with control flow is too cumbersome. Your "Weakest-precondition of unstructured programs" paper looked promising, so I switched to Boogie.


Irreducibility shouldn't be a problem really. The transformations Boogie makes are:

  1. Split assert at the beginning of basic block into assert at the end of all direct predecessors and assume instead of assert in current block
  2. Cut back edges and havoc variables that change on path from loop header to back edge

Both of these don't require irreducibility. Irreducibility just ensures that there is only one loop header, so you can find where to apply 1 and 2. In irreducible graph - with more than one loop entry - you can just use the block that comes first in code as loop header.
P.S. I'm currently writing a Coq proof that doing this is actually ok

gauravpartha commented 1 month ago

Irreducibility shouldn't be a problem really. The transformations Boogie makes are:

1. Split `assert` at the beginning of basic block into `assert` at the end of all direct predecessors and `assume` instead of `assert` in current block

2. Cut back edges and `havoc` variables that change on path from loop header to back edge

Both of these don't require irreducibility. Irreducibility just ensures that there is only one loop header, so you can find where to apply 1 and 2. In irreducible graph - with more than one loop entry - you can just use the block that comes first in code as loop header.

Reducibility is a requirement for the back-edge elimination to be sound (that is, each loop must have at most one entry point). This example shows why irreducibility leads to unsoundness in general:

procedure irreducible() {
    var i: int;
    var j: int;

    i := 1;
    goto B1, B4;

    B1:
        assume j > 0;
        goto B2;

    B2:
        assert i > 0;
        goto B3;

    B3:
        assert j > 0;
        goto B4;

    B4:
        i := i+1;
        goto B2;
}

Here, B2 and B4 are two entry points of the same loop and thus the resulting CFG is irreducible.

This program is not correct and thus should not be verified by Boogie, because the execution path B4 -> B2 -> B3 leads to failure if j < 0 holds at the beginning.

Suppose we choose to eliminate the back edge from B4 to B2 (instead of B3 to B4). Then, the resulting program is correct (and would be verified by Boogie), which shows the unsoundness.

One correct way of verifying this program would be to first turn the CFG into a reducible CFG with the same executions and to then do the back-edge elimination.

P.S. I'm currently writing a Coq proof that doing this is actually ok

Interesting. We made a subset of Boogie proof-producing using Isabelle a while ago (see https://link.springer.com/chapter/10.1007/978-3-030-81688-9_33), which included formally justifying the back-edge elimination (in our approach, a proof is produced on every run instead of proving the approach once and for all, which simplies things). It would be interesting to know how your proof compares to ours.

shazqadeer commented 1 month ago

I studied this issue more thoroughly. It appears to me that there are two separate issues going on:

petemud commented 1 month ago

@gauravpartha You are absolutely correct. I was wrong about cutting back edge in irreducible graph, because it also cuts a forward edge in corresponding reducible graph.

graph LR
    A["`*A:*`"]
    B["`*B:* x := 1`"]
    C["`*C:*`"]
    D["`*D:* **assert** x == 1`"]
    E["`*E:*`"]
    F["`*F:*`"]
    A --> B & E
    B --> C
    C --> D
    D --> E
    %%E -. blocks between E to C .-> C
    E --> F
    F -.-> C
graph LR
    A["`*A:*`"]
    B["`*B:* x := 1`"]
    C["`*C:*`"]
    D["`*D:* **assert** x == 1`"]
    E["`*E:*`"]; E'["`*E':*`"]
    F["`*F:*`"]; F'["`*F':*`"]
    A --> B & E'
    B --> C
    C --> D
    D --> E
    %%E -.blocks between E to C.-> C
    %%E' -.blocks between E to C.-> C
    E --> F
    F -.-> C
    E' --> F'
    F' -.-> C

I still think - not sure if it's advantageous though - that this issue can be worked around without copying any blocks by:

  1. Not havocing anything
  2. Only relying on pushing asserts from loop head to its predecessors, i.e. only using invariants. For reducible loops automatically asserting variables that don't change should be as easy as havocing those that change
  3. Cutting ALL of the edges pointing to loop head
    graph LR
    A["`*A:*`"]
    B["`*B:* x := 1`"]
    C["`*C:*`"]
    D["`*D:* **assert** x == 1`"]
    E["`*E:*`"]
    F["`*F:*`"]
    A --> B & E
    B -.-> C
    C --> D
    D --> E
    %%E -.blocks between E to C.-> C
    %%E' -.blocks between E to C.-> C
    E --> F
    F -.-> C
  4. Perhaps making some experimental tool that does all of the above

An example of verification condition in the style of WP of unstructured programs. Which fails. As it should

(declare-const x Int)

(declare-const A_ok Bool)
(declare-const B_ok Bool)
(declare-const C_ok Bool)
(declare-const D_ok Bool)
(declare-const E_ok Bool)
(declare-const F_ok Bool)

(define-fun A_be () Bool
  (= A_ok
    (and B_ok E_ok)
  )
)
(define-fun B_be () Bool
  (= B_ok
    (=> (= x 1) true)
  )
)
(define-fun C_be () Bool
  (= C_ok D_ok)
)
(define-fun D_be () Bool
  (= D_ok
    (and (= x 1) E_ok)
  )
)
(define-fun E_be () Bool
  (= E_ok F_ok)
)
(define-fun F_be () Bool
  (= F_ok true)
)

(assert
  (not
    (=>
      (and
        A_be
        B_be
        C_be
        D_be
        E_be
        F_be
      )
      (and
        A_ok
        C_ok
      )
    )
  )
)

(check-sat)
(get-model)

shazqadeer commented 1 month ago

Even if an automatic method for converting an irreducible CFG to a reducible CFG could be implemented, I don't understand how it would address the problem of specifying the appropriate loop invariant needed for verifying the loop. For well-structured (or reducible) loops, it is clear to the programmer where to put the loop invariant and this information is exploited by Boogie to generate the appropriate verification conditions.

Any suggestions?

petemud commented 1 month ago

Invariant always comes in loop header. Here are a couple of suggestions for which one of two loop headers to use:

  1. Use as loop header the block that comes first in program code
  2. Use block that has asserts; fail if both loop entries have them
  3. Reuse invariant keyword. This keyword would only be allowed in a loop header and have the same meaning as assert. Again, if both loop entries have invariant, then fail