diffblue / cbmc

C Bounded Model Checker
https://diffblue.github.io/cbmc
Other
836 stars 262 forks source link

Performance issue with large arrays #7012

Open SaswatPadhi opened 2 years ago

SaswatPadhi commented 2 years ago

CBMC version: f9bd83ec6e2199436cc12516616025154a59f82a

CBMC seems to be struggling to prove unbounded correctness under large allocations. Here is a super simplified version of a proof that I and @akshayutture are working on:

struct elem { char c[ESIZE]; };

int main() {
  struct elem *a = malloc(ASIZE * sizeof(struct elem));

  for(unsigned i = 0; i < ASIZE; ++i)
  __CPROVER_assigns(i, __CPROVER_POINTER_OBJECT(a))
  __CPROVER_loop_invariant(1 == 1)
  {
    a[i].c[0] = 0;
  }
}

The loop isn’t doing anything interesting and the invariant is trivial. The only “expensive” operations here are the initial malloc and the subsequent havoc (for the assigns clause).

We verify this example using:

$ goto-cc -o test.1.o -DESIZE=100 -DASIZE=1000 test.c
$ goto-instrument --apply-loop-contracts test.1.o test.2.o
$ cbmc test.2.o

Here are the proof times we observe:

 Elem Size | Array Size | Proof time 
-----------+------------+------------
    100    |    1000    |    0.5 s
    500    |    1000    |     2 s
   1000    |     500    |     2 s
   1000    |    1000    |     5 s
    500    |   10000    |    24 s

@tautschnig figured out that this is because CBMC is not using array theory in this case and it bit blasts the large array which leads to the performance degradation.

He suggested the following workaround for initializing the a array:

  unsigned long asize;
  struct elem *a = malloc(asize * sizeof(struct elem));
  __CPROVER_assume(asize == ASIZE);

and indeed it fixes the performance issue that we notice above.

I am leaving this issue open so we fix this issue within CBMC at some point in the future.

tautschnig commented 2 years ago

A bunch of comments: 1) #2108 would fix that the array theory be used automatically in such a case, and there were attempts to merge this when this should really not have happened, and indeed cannot happen, until #1874 is addressed. 2) There is a manual way of enforcing the use of the array theory via --array-uf-always, but this currently fails an INVARIANT when applied to the above example. This should likely be addressed as part of #2108, though could perhaps also be done earlier.

SaswatPadhi commented 2 years ago

Here is another interesting test case from @akshayutture:

#include <stdlib.h>

main() {
    size_t numElements;
#ifndef CONSTANT_SIZE
    int ** list = (int**) malloc(numElements * sizeof(*(list)));
#else
    int ** list = (int**) malloc(MAX_SIZE * sizeof(*(list)));
#endif
    __CPROVER_assume(numElements == MAX_SIZE);

    size_t index = 0;
    for (size_t i = 0; i < MAX_SIZE; i++) {
        if (nondet_bool()) {
            list[index] = NULL;
            index++;
        }
    }
}

CBMC is actually much slower when activating array theory in this case:

$ goto-cc -o test.o -DMAX_SIZE=32 test.c
$ time cbmc --pointer-check test.o | tail -n 2
** 0 of 8 failed (1 iterations)
VERIFICATION SUCCESSFUL

real    0m46.846s
user    0m42.394s
sys     0m3.118s

vs

$ goto-cc -o test_constant.o -DMAX_SIZE=32 -DCONSTANT_SIZE=1 test.c
$ time cbmc --pointer-check test_constant.o | tail -n 2
** 0 of 8 failed (1 iterations)
VERIFICATION SUCCESSFUL

real    0m0.891s
user    0m0.802s
sys     0m0.073s
tautschnig commented 1 year ago

This will eventually be fixed by #1874.