diffblue / cbmc

C Bounded Model Checker
https://diffblue.github.io/cbmc
Other
848 stars 264 forks source link

Performance of framing #8503

Open hanno-becker opened 1 week ago

hanno-becker commented 1 week ago

I'm observing that wrapping a function operating on a buffer by another function which merely 'frames' it to operate on a slice of the buffer, leads to very slow proof times.

Here is a simple example, where the sub-function bumps a bound on the sub buffer. This is a minimal version of an example arising in practice when verifying the Number Theoretic Transform (https://github.com/pq-code-package/mlkem-native/pull/371).

// file: frame_harness.c

// instructions
//
// goto-cc frame_slowness.c --function harness -o a.out
// goto-instrument --dfcc harness --replace-call-with-contract bump_A_to_B --enforce-contract bump_slice_A_to_B a.out b.out
// cbmc b.out --bitwuzla

#include <stdint.h>

#define A 12
#define B 42
#define SZ 256

void bump_A_to_B(int8_t *x, int sz)
    __CPROVER_requires(__CPROVER_is_fresh(x, 2 * sz))
    __CPROVER_requires(__CPROVER_forall {
            int idx;
            0 <= idx && idx <= 2 * sz - 1 ==> x[idx] <= A
    })
    __CPROVER_assigns(__CPROVER_object_upto(x, 2 * sz))
    __CPROVER_ensures(__CPROVER_forall {
            int idx;
            0 <= idx && idx <= 2 * sz - 1 ==> x[idx] <= B
    });

void bump_slice_A_to_B(int8_t *x, int len, int start, int sz)
    __CPROVER_requires(2 <= len && len <= SZ && (len & 1) == 0)
    __CPROVER_requires(0 <= start && start < len)
    __CPROVER_requires(1 <= sz && sz <= len / 2 && start + 2 * sz <= len)
    __CPROVER_requires(__CPROVER_is_fresh(x, len))
    __CPROVER_requires(__CPROVER_forall {
            int i0;
            0 <= i0 && i0 <= start - 1 ==> x[i0] <= B
    })
    __CPROVER_requires(__CPROVER_forall {
            int i1;
            start <= i1 && i1 <= len - 1 ==> x[i1] <= A
    })
    __CPROVER_assigns(__CPROVER_object_upto(x, len))
    __CPROVER_ensures(__CPROVER_forall {
            int i0;
            0 <= i0 && i0 <= start + 2 * sz - 1 ==> x[i0] <= B
    })
    __CPROVER_ensures(__CPROVER_forall {
            int i1;
            start + 2 * sz <= i1 && i1 <= len - 1 ==> x[i1] <= A
    })
{
    bump_A_to_B(x + start, sz);
}

void harness(void) {
    int8_t *x;
    int len, start, sz;
    bump_slice_A_to_B(x, len, start, sz);
}

I was expecting the proof to be almost instantaneous, but for SZ==256, which is the value I am working with in practice, I have not yet seen it terminate.

The issue is the same between Z3 and Bitwuzla.

CBMC Version 6.3.1

hanno-becker commented 1 week ago

@remi-delmas-3000 @tautschnig Do you have ideas here?

remi-delmas-3000 commented 4 days ago

Hi, I don't think its a problem with framing per se (as in checking that side effects are included in the assigns clauses, checking assigns clause inclusion, etc.). What seems to be happening is that solvers struggle with quantified proof obligations.

I encoded the contract transformation manually (to get rid of frame condition checking) which gives the following straight-line code:

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>

#define A 12
#define B 42
#define SZ 256

void harness(void)
{
  int len, start, sz;
  __CPROVER_assume(2 <= len && len <= SZ && (len & 1) == 0);
  __CPROVER_assume(0 <= start && start < len);
  __CPROVER_assume(1 <= sz && sz <= len / 2 && start + 2 * sz <= len);
  int8_t *x = malloc(len * sizeof(int8_t));
  __CPROVER_assume(x != NULL);
  __CPROVER_assume(__CPROVER_forall {
    int i;
    0 <= i && i < start ==> x[i] <= B
  });

  __CPROVER_assume(__CPROVER_forall {
    int i;
    start <= i && i < len ==> x[i] <= A
  });

  /// inline call to bump_A_to_B(x + start, sz);
  int8_t *call_arg1 = x + start;
  int call_arg2 = sz;
  {
    int8_t *x = call_arg1;
    int sz = call_arg2;
    __CPROVER_assert(
      __CPROVER_forall {
        int i;
        0 <= i && i < 2 * sz ==> x[i] <= A
      },
      "precondition replaced call");

    __CPROVER_havoc_slice(x, 2 * sz);
    __CPROVER_assume(__CPROVER_forall {
      int i;
      0 <= i && i < 2 * sz ==> x[i] <= B
    });
  }
#ifdef POST1
  __CPROVER_assert(
    __CPROVER_forall {
      int i;
      0 <= i && i< start + 2 * sz ==> x[i] <= B
    },
    "post condition 1");
#endif

#ifdef POST2
  __CPROVER_assert(
    __CPROVER_forall {
      int i;
      start + 2 * sz <= i && i<len ==> x[i] <= A
    },
    "post condition 2");
#endif

The I analyzed POST1 and POST2 separately and together, got these results:

|                             | z3         | bw         |
| ------------ | ------------ | ---------- | ---------- |
| no contracts | post 1 only  | 42m41.954s | 33m2.139s  |
|              | post 2 only  | 3m1.328s   | 0m14.932s  |
|              | post 1+2     | 25m34.864s | 17m30.813s |
| ------------ | ------------ | ---------- | ---------- |
| contracts    | post 1 only  | 19m14.907s | 19m40.051s |
|              | post 2 only  | 0m22.193s  | 2m30.510s  |
|              | post 1+2     | 24m3.121s  | 26m26.308s |

So using contracts vs the straigh-line code that does not do frame condition checking does not matter that much on the overall runtime.

What we see is that solvers struggle to prove the first post condition that says that x[i] <= B holds in the extended range [0, start + 2 *sz);

If we split all the pre and post conditions over ranges [0, start), [start, start + 2 * sz) and [start + 2 * sz, len) we should make the proof easier because on [0, start) and [start + 2 * sz, len) that are not havoced the postconditions stem directly form the preconditions. And indeed that's what we see.

|                             | z3         | bw         |
| ------------ | ------------ | ---------- | ---------- |
| no contracts | post 1 only  |  0m16.543s | 2m29.401s  |
|              | post 2 only  |  10m31.124s| 32m26.411s |
|              | post 3 only  |  0m15.207s | 2m11.520s  |
|              | all          |  15m24.019s| 34m1.366s  |
| ------------ | ------------ | ---------- | ---------- |
| contracts    | post 1 only  |  0m40.772s | 2m11.435s  |
|              | post 2 only  |  20m37.789s| 14m32.638s |
|              | post 3 only  |  0m26.735s | 2m27.044s  |
|              | all          |  26m12.638s| 19m34.666s |
| ------------ | ------------ |  ----------| ---------- |

We see that POST1 on [0, start) and POST3 on [start + 2 * sz, len) are solved quickly, and POST 2 on [start, start + 2 * sz) is hard to solve.

Encoding of arrays

CBMC encodes bounded arrays in SMT as a pair of unbounded array and a length.

In this program we have two versions of array x, x0 and x1 that represent the pre- and post- havoc_slice states.

(Slightly simplified from the actual encoding)

(declare-const x0 (Array (_ BitVec 64) (_ BitVec 8)))
(declare-const x0_len (_ BitVec 64))

(declare-const x1 (Array (_ BitVec 64) (_ BitVec 8)))
(declare-const x1_len (_ BitVec 64))

The havoc_slice operation uses an intermediate nondeterministic array and a forall constraint to define x1 as equal to x0 in the range [0, start) or [start + 2 * sz, x0_len)

(declare-const nondet_bytes (Array (_ BitVec 64) (_ BitVec 8)))
(declare-const nondet_bytes_len (_ BitVec 64))
(assert (= nondet_bytes_len (* 2 sz)))

;; havoc slice post condition
;; forall idx,
;;    0 <= idx && idx < x1_len =>
;;       x1[idx] =  (idx < start || idx >= start + 2*sz ) ? x0[idx] : nondet_bytes[idx - start]
(assert
   (forall ((idx (_ BitVec 64)))
       (=>
          (and
            (bvule (_ bv0 64) idx)
            (bvult idx x1_len))
          (=
              (select x1 idx)
              (ite
                  (or
                     (not (bvsge idx ((_ zero_extend 8) start)))
                     (bvuge idx
                            (bvadd
                               ((_ zero_extend 8) start)
                               ((_ sign_extend 32) (bvmul (_ bv2 32) sz)))))
                  (select x0 idx)
                  (select nondet_bytes (bvadd idx (bvneg ((_ zero_extend 8) start)))))))))

The post condition on the range [start, start+2*sz) is not expressed using a quantifier, instead, we introduce a free idx constant and an assertion. We see that the assert is guarded by a lengthy path condition (and true B149 B150 B151 B152 B153 B154 B155) that encodes the reachability of that location under along all possible paths in the program. (there are short-cutting operations and points of failure so there's many ways to reach the assert).

;; POST2 proof obligation: forall idx, idx1 < 0 || idx1 >= start + 2 * sz || !(x[idx + start] >= 43)
(declare-fun idx1 () (_ BitVec 32))
(define-fun B196 () Bool
   ;; path condition
   (=> (and true B149 B150 B151 B152 B153 B154 B155)
      (or
        (not (bvsge idx1 (_ bv0 32)))
        (bvsge idx1 (bvadd start (bvmul (_ bv2 32) sz)))
        (not
          (bvsge
            ((_ sign_extend 24)
              (select
                x1
                (bvadd
                  ((_ zero_extend 8) ((_ extract 55 0) start))
                  ((_ sign_extend 32) idx1))))
            (_ bv43 32))))))
(assert (not B196))
(check-sat)

So really both z3 and bitwuzla struggle to establish that (not B196) is UNSAT by grounding the quantified axioms that encode the havoc operation by connecting x0, nondet_bytes and x1 and the post condition on a slice within x1 that's shifted right from start bytes.

I've seen that the proof times are sensitive to the max array len so I'm wondering wether the solvers end up proving it for each individual cell.

Maybe the encoding of the post condition could also use a quantifier instead of a ground constant for idx1 ?

Any suggestions on how to improve the encoding ?

remi-delmas-3000 commented 4 days ago

Hi, I played a little bit more with the encoding and was able to bring the runtime down to 1 second for bitwuzla, 20 seconds for z3 with some tweaks in the encoding, while scaling to the possible allocation size.

I was able to come up with this. The main changes are:

#include <assert.h>
#include <stdint.h>
#include <stdlib.h>
#include <stdbool.h>

#define A 12
#define B 42
#define SZ (__CPROVER_max_malloc_size-1)

void harness() {
  size_t len, start, sz;
  __CPROVER_assume(2 <= len && len <= SZ);
  __CPROVER_assume((len & 1) == 0);
  __CPROVER_assume(0 <= start && start < len);
  __CPROVER_assume(1 <= sz && sz <= len / 2);
  __CPROVER_assume(start + 2 * sz <= len);
  int8_t *x = malloc(len * sizeof(int8_t));
  __CPROVER_assume(x != NULL);
  __CPROVER_assume(__CPROVER_forall {
    size_t i;
    0 <= i && i < start ==> x[i] <= B
  });

  __CPROVER_assume(__CPROVER_forall {
    size_t i;
    start <= i && i < len ==> x[i] <= A
  });

  // inlined call arguments
  int8_t *call_arg1 = x + start;
  size_t call_arg2 = sz;
  {
    int8_t *x = call_arg1;
    size_t sz = call_arg2;
    __CPROVER_assert(
      __CPROVER_forall {
        size_t i;
        0 <= i && i < 2 * sz ==> x[i] <= A
      },
      "precondition replaced call");
    int8_t nondet_bytes[2*sz];
     // assume the post condition directly on the havoced bytes
    __CPROVER_assume(__CPROVER_forall {
      size_t i;
      (0 <= i && i < (2 * sz)) ==> nondet_bytes[i] <= B
    });
    __CPROVER_array_replace(x, nondet_bytes);
  }

  // assert post condition
  bool post1 = __CPROVER_forall {
    size_t i;
    (0 <= i && i < (start + 2 * sz)) ==> x[i] <= B
  };
  assert(post1);

  bool post2 = __CPROVER_forall {
    size_t i;
    ((start + 2 * sz) <= i && i < len) ==> x[i] <= A
  };
  assert(post2);
}
** 0 of 36 failed (1 iterations)
VERIFICATION SUCCESSFUL
cbmc --function harness -DENC15 hanno_simpler.c --z3 -DMAX_SIZE  18.77s user 0.18s system 98% cpu 19.153 total

I think makes it easier for z3 to trigger quantifiers since guards match syntactically. Below I explain what I think are the problems with the original formulation.

Problem 1 : encoding of havoc_slice + assumptions

The contract replacement of bump_A_B is represented as

  // inlined call to bump_A_B
  int8_t *call_arg1 = x + start;
  size_t call_arg2 = sz;
  {
    int8_t *x = call_arg1;
    size_t sz = call_arg2;
    assert( __CPROVER_forall {
        size_t i;
        0 <= i && i < 2 * sz ==> x[i] <= A
    });
    __CPOVER_havoc_slice(x, 2 * sz);
    __CPROVER_assume(__CPROVER_forall {
      size_t i;
      (0 <= i && i < (2 * sz)) ==> x[i] <= B
    });
  }

__CPOVER_havoc_slice(x, 2*sz); desugars to:

uint8_t nondet_bytes[2*sz];
__CPROVER_array_replace(x, nondet_bytes);

As described above, at the SMT-level, all arrays x, nondet_bytes, and the updated x are represented using a pair formed of a functional SMT array and a size, and __CPROVER_array_replace(x,nondet_bytes) turns into a quantified axiom linking all three arrays (using Int instead of BitVec types here to simplify)

;; x before havoc_slice
(declare-const x!0 (Array Int Int))
(declare-const x_len!0 Int)

;; nondet byte array used by havoc_slice
(declare-const nondet_bytes (Array Int Int))
(declare-const nondet_bytes_len Int)

;; x after havoc_slice
(declare-const x!1 (Array Int Int))
(declare-const x_len!1 Int)

;; connect x!0, nondet_bytes, x!1 havoc_slice
(assert
   (forall ((i Int))
       (=> (and (<= 0 i) (< i x_len!1))
          (= (select x!1 i)
              (ite (and (<= start i) (< i (+ start (* 2 sz))))
                     (select nondet_bytes (- i start))
                     (select x!0 i))

The post condition of bump_A_B is assumed on a slice of the array x!1

;; assumption post call
(assert
   (forall ((i Int))
       (=> (and (<= 0 i) (< i (* 2 sz)))
          (<= (select x!1 (+ start  i)) B)

The final proof obligation is expressed on x!1 and is almost the same as the post condition of bump_A_B.

So we assumed something like:

;; bump_A_B post condition
 __CPROVER_assume(__CPROVER_forall {
      size_t i;
      (0 <= i && i < (2 * sz)) ==> (x+start)[i] <= B
    });
  }

and we are trying to prove this

;; top level post condition
 __CPROVER_assert(__CPROVER_forall {
      size_t i;
      (start <= i && i < start + (2 * sz)) ==> x[i] <= B
    });
  }

The start offset moved from the RHS(x+start)[i] LHS of the proof obligation (start <= i && i < start + (2 * sz)), I think that's what prevents triggering and unification in the solver.

Problem 2 : grounding foralls too early, losing guard structure

Moreover, instead of using the exact quantified formula we specified in the source, CBMC eliminates the quantifier by introducing a witness variable for the quantified index and grounding, while also losing the implicative structure.

(declare-const i Int)
(define-fun POST1 () Bool
  (=> some_path_condition
    (or (> start i) (>= i  (+ start (* 2 sz)) (<= (select x!1 i) B)))
(assert (not POST1))
(check-sat)

Given that quantifier instantiation is really dependent on syntactic triggers I suspect that the z3 or bitwuzla cannot trigger anything anymore and are stuck enumerating counter examples and learning clauses one by one indirectly rather that discharging a forall by grouding a single forall.

Problem 3: indexing into arrays with int instead of size_t

When using size_t indices the runtime does not seem to increase with the max array size and is counted in seconds (even sub-second with bitwuzla).

SZ      | bitwuzla  | z3
--------|-----------|----------
256     |  0m0.807s |0m2.010s
512     |  0m0.828s |0m0.778s
1024    |  0m0.879s |0m2.765s
2048    |  0m0.875s |0m4.281s
4096    |  0m0.902s |0m30.581s
8192    |  0m1.017s |0m6.971s
16384   |  0m0.947s |0m2.056s
32768   |  0m1.007s |0m2.108s
65536   |  0m1.094s |0m2.388s
131072  |  0m1.032s |0m7.707s
262144  |  0m1.172s |0m27.049s

Using int instead of size_t sends the runtime to minutes. Using int causes a lot more sign-extend, zero-extend, byte-extracts, ... in the formula which may make it harder to instantiate quantifiers properly.

Differences between solvers

The last observation I made is that z3 fares a little bit better with int is used as index rather than size_t. Some encoding patterns that benefited z3 were sometimes harming bitwuzla, so different solvers may require different encoding tricks.