diffblue / cbmc

C Bounded Model Checker
https://diffblue.github.io/cbmc
Other
848 stars 263 forks source link

Verbosity of output when proving 1 function #7747

Open rod-chapman opened 1 year ago

rod-chapman commented 1 year ago

Using CBMC 5.84 on macOS. I want to prove exactly 1 function is correct with respect to its contracts.

My function (See attached isqrt.c) is about 20 lines of code. For that, CBMC produces 1651 lines of output on stdout, almost all of which is useless noise and has no relation to the code I'm trying to verify. See attaches isqrt_results.txt for the full output. This is with the default verbosity setting.

What are all the proofs of the "builtin_library" functions for? They dominate the output but have nothing to do with my code as far as I can see.

Ideas.

  1. When ALL N POs for a function yields SUCCESS, then summarize the output of that function into 1 line ("ALL SUCCESS") rather than N lines of output.
  2. Group outputs so that the results for the function I have requested appear right at the end of the output, so are still visible on screen when then analysis terminates. Group together results for the declaration and definition of my function (i.e. so that results for my_code.h and my_code.c appear next to each other.)
  3. Supply a verbosity setting that completely suppresses results for all that "builtin_library" stuff.

See attached isqrt_concise.txt file for an example of what this might look like (I created this with an editor just to see what it would look like.) isqrt_concise.txt isqrt_results.txt

Code

#include <stdint.h>

#ifdef CBMC
#else
#define __CPROVER_assigns(...)
#define __CPROVER_decreases(...)
#define __CPROVER_assert(...)
#define __CPROVER_requires(...)
#define __CPROVER_ensures(...)
#define __CPROVER_loop_invariant(...)
#endif

// truncated square root of 2**31-1
#define INT32_ROOT_MAX 46340

int32_t isqrt(int32_t x)
__CPROVER_requires (x >= 0)
__CPROVER_ensures  (0 <= __CPROVER_return_value &&
                    __CPROVER_return_value <= INT32_ROOT_MAX &&
                    (__CPROVER_return_value * __CPROVER_return_value <= x) &&
                    (((int64_t) __CPROVER_return_value + 1 ) * ((int64_t) __CPROVER_return_value + 1) > (int64_t) x))
{
    int32_t upper, lower, middle;
    lower = 0;
    upper = INT32_ROOT_MAX + 1;
    while(lower + 1 != upper)
    __CPROVER_assigns(middle, lower, upper)
    __CPROVER_loop_invariant(0 <= lower && lower <= INT32_ROOT_MAX)
    __CPROVER_loop_invariant(0 <= upper && upper <= (INT32_ROOT_MAX + 1))
    __CPROVER_loop_invariant(lower * lower <= x)
    // cast to int64_t here to avoid overflow on *
    __CPROVER_loop_invariant((int64_t) upper * (int64_t) upper > (int64_t) x)
    __CPROVER_decreases(upper - lower)
    {
        middle = (lower + upper) / 2;
        if((middle * middle) > x)
            upper = middle;
        else
            lower = middle;
    }
    return lower;
}

void isqrt_harness()
{
    int32_t x, y;
    y = isqrt(x);
}

Makefile

TARGET=isqrt

all: $(TARGET)

CHECKS=--bounds-check --pointer-check \
       --memory-cleanup-check --div-by-zero-check --signed-overflow-check \
       --unsigned-overflow-check --pointer-overflow-check --conversion-check \
       --undefined-shift-check --enum-range-check --pointer-primitive-check

$(TARGET): $(TARGET)_contracts.goto
    cbmc $(CHECKS) --external-sat-solver cadical $<

$(TARGET)_contracts.goto: $(TARGET).goto
    goto-instrument $(CHECKS) --dfcc $(TARGET)_harness --apply-loop-contracts --enforce-contract $(TARGET) $< $@

$(TARGET).goto: $(TARGET).c
    goto-cc --function $(TARGET)_harness -DCBMC -o $@ $<

clean:
    rm -f *.goto
rod-chapman commented 1 year ago

In contrast, the SPARK Pro toolset, proving an equivalent function produces 11 lines of output:

rodchap@f4d4889dcf6d sqrt % gnatprove -Psqrt -u p.adb
Phase 1 of 2: generation of Global contracts ...
Phase 2 of 2: flow analysis and proof ...
p.adb:10:07: info: initialization of "Lower" proved
p.adb:11:07: info: initialization of "Upper" proved
p.adb:12:07: info: initialization of "Middle" proved
p.adb:20:33: info: loop invariant initialization proved
p.adb:20:33: info: loop invariant preservation proved
p.adb:22:31: info: loop variant proved
p.adb:27:36: info: division check proved
p.ads:11:19: info: postcondition proved
Summary logged in /Users/rodchap/Desktop/rod/projects/spark/sqrt/gnatprove/gnatprove.out
rod-chapman commented 1 year ago

Even more concisely, Dafny 4 verifies the same function, reporting:

rodchap@f4d4889dcf6d book % dafny verify sqrt2.dfy

Dafny program verifier finished with 1 verified, 0 errors
qinheping commented 1 year ago

What are all the proofs of the "builtin_library" functions for?

Those are library functions used for DFCC. Most of checks are disabled for the library function __CPROVER_contracts_write_set_check_assignment (https://github.com/diffblue/cbmc/blob/develop/src/ansi-c/library/cprover_contracts.c#L764). @remi-delmas-3000 Could you confirm that if there are more checks we can disable for the other contracts library functions?

TGWDB commented 1 year ago

Ideas.

  1. When ALL N POs for a function yields SUCCESS, then summarize the output of that function into 1 line ("ALL SUCCESS") rather than N lines of output.

This seems like a nice improvement.

  1. Group outputs so that the results for the function I have requested appear right at the end of the output, so are still visible on screen when then analysis terminates. Group together results for the declaration and definition of my function (i.e. so that results for my_code.h and my_code.c appear next to each other.)
  2. Supply a verbosity setting that completely suppresses results for all that "builtin_library" stuff.

I think qinheping is focusing on this. However, I'd also propose a potential 4th option:

  1. Supply a --quiet or similar mode that suppresses SUCCESS results, so that you see either: only FAILURE results, or (as proposed for 1 above) ALL SUCCESS.
rod-chapman commented 1 year ago

Or perhaps "--quiet" as you suggest becomes one of the well-defined settings for --verbosity? Or perhaps two settings - one as you suggest above, and one like that but also suppressing all output relating to the "builtin" functions if they all result in SUCCESS.

Let's implement Idea 1) above first, then we'll see where to go next once I have tried it out...

remi-delmas-3000 commented 1 year ago

@rod-chapman the built-in functions mentioned are support function used to instrument frame condition checks on the user program, they perform pointer and arithmetic operations which have to be checked to ensure global soundness. Having a --quiet option that silences outputs for successful built-in functions but still reports failures could work, but I'd rather let CBMC output in full and filter that output in a contract-aware fashion outside of CBMC itself (CBMC does not know about contracts).

rod-chapman commented 1 year ago

My simple function has 1 parameter of type int32_t and returns an int32_t - no pointers, no pointer arithmetic, no allocation etc. etc. Why are all these builtin functions being dragged in? I see no need for them.

remi-delmas-3000 commented 1 year ago

There's still frame conditions to check for the loop itself so the machinery is instantiated and the dependencies are pulled. The instrumentation pass prunes only the most trivial proof obligations itself but does not try to be smart, instead it relies on CBMC's constant propagation and static simplifications to prune remaining trivial checks.

When analysing the instrumented version with --pointer-check --signed-overflow-check --unsigned-overflow-check --conversion-check --div-by-zero-check we see that 95 VCCs get instantiated, and only 17 remain after static simplification. The SAT solver is only used to discharge these remaining 17 VCCs.

Generated 95 VCC(s), 17 remaining after simplification

Below is a manual rendition of the instrumented program, minus the frame condition checks that instrumentation inserts.

#include <stdint.h>
#include <stdlib.h>

int32_t nondet_int32_t();

// truncated square root of 2**31-1
#define INT32_ROOT_MAX 46340

int32_t isqrt(int32_t x) {
  // precondition
  __CPROVER_assume(x >= 0);
  int32_t return_value = __isqrt(x);
  // check post conditions
  __CPROVER_assert(
    (0 <= return_value &&return_value <= INT32_ROOT_MAX) &&
    (return_value * return_value <= x) &&
    (((int64_t)return_value + 1) * ((int64_t)return_value + 1) > (int64_t)x),
    "postconditions");
  return return_value;
}

int32_t __isqrt(int32_t x) {

  int32_t lower, middle, upper;
  lower = 0;
  upper = INT32_ROOT_MAX + 1;

  // loop invariant base case
  assert(0 <= lower && lower <= INT32_ROOT_MAX);
  assert(0 <= upper && upper <= (INT32_ROOT_MAX + 1));
  assert(lower * lower <= x);
  assert((int64_t)upper * (int64_t)upper > (int64_t)x);

  // havoc loop state
  lower = nondet_int32_t();
  middle = nondet_int32_t();
  upper = nondet_int32_t();

  // assume loop invariant
  __CPROVER_assume(0 <= lower && lower <= INT32_ROOT_MAX);
  __CPROVER_assume(0 <= upper && upper <= (INT32_ROOT_MAX + 1));
  __CPROVER_assume(lower * lower <= x);
  __CPROVER_assume((int64_t)upper * (int64_t)upper > (int64_t)x);

  // snapshot variant
  int32_t old_decreases = upper - lower;

  // loop step
  if (lower + 1 != upper) {
    // loop body
    middle = (lower + upper) / 2;
    if ((middle * middle) > x) {
      upper = middle;
    } else {
      lower = middle;
    }

    // check invariant step case
    assert(0 <= lower && lower <= INT32_ROOT_MAX);
    assert(0 <= upper && upper <= (INT32_ROOT_MAX + 1));
    assert(lower * lower <= x);
    assert((int64_t)upper * (int64_t)upper > (int64_t)x);

    // check loop variant
    assert(upper - lower < old_decreases);

    // stop progress
    __CPROVER_assume(0);
  }
  return lower;
}

void main() {
  int32_t x;
  isqrt(x);
}

Analysing this model directly, we see that 24 VCCs get generated and 20 remain after static simplification, and the SAT solver is used to discharge only these 20 VCCs.

Generated 24 VCC(s), 20 remaining after simplification

So this shows that the automatically instrumented model does not have more VCCs than the manually encoded one (that does not check frame conditions). Most if not all VCCs introduced by frame condition checks are simplified away even before reaching the SAT solver.

These trivial or unreachable VCCs are still reported in the output and I agree they could be filtered out, but CBMC itself does not know about contracts.

All of this to say that there might seem like there's a lot of overhead introduced, but in practice not so much. On the performance aspects (#7748), the observed runtime is certainly due to multiplications being bitblasted and not to library functions being pulled in by the instrumentation.

rod-chapman commented 1 year ago

I've been trying to think about this in terms of what I actually want to see as a user. It strikes me there are two "axes" to think about.

  1. The set of functions, for which results are to be reported. For me, there are two obvious sets - 1. the set of functions that the user actually requested to be analysed. This might be exactly one function, all the functions in a translation unit, or a "whole program" comprising some set of translation units. 2. the set of functions are are dependent on those requested - either internal stuff to do with contracts, bits of the C runtime library, or whatever else gets dragged in. I will call these sets "UR" (user-requested), and "D" (dependents") respectively.

  2. The verbosity of the output for each function analysed. I can think of three obvious "levels" here - "All" (meaning report all SUCCESS and FAILURE lines), "Summarized" (meaning report "SUCCESS ALL" on a single line where every check for a function reports SUCCESS, plus any FAILUREs elsewhere), and "Failures" (meaning suppress all SUCCESS lines and report FAILURE lines only).

Those two axes give us six permutations, or possibly two verbosity switches - one for "UR" and one for "D", with syntax something like: --ur-verbosity=[All|Summarized|Failures] --d-verbosity=[All|Summarized|Failures]

That leaves the question of what the default should be if unspecified... I might go for

--ur-verbosity=All --d-verbosity=Summarized

for example.

rod-chapman commented 1 year ago

For another data point of how crazily verbose the current default setting is... I have a function where there is one line of code that looks like this:

    state[7] ^= H;

Here, "state" is a pointer to an array of 8 uint32_t values, so this is clearly fine.

CBMC reports (wait for it...) 38 lines of output for this one line of code...

[x_y.assigns.1248] line 284 Check that state[(signed long int)7] is assignable: FAILURE
[x_y.assigns.2494] line 284 Check that state[(signed long int)7] is assignable: FAILURE
[x_y.pointer_arithmetic.229] line 284 pointer arithmetic: pointer NULL in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.230] line 284 pointer arithmetic: pointer invalid in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.231] line 284 pointer arithmetic: deallocated dynamic object in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.232] line 284 pointer arithmetic: dead object in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.233] line 284 pointer arithmetic: pointer outside object bounds in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.234] line 284 pointer arithmetic: invalid integer address in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.235] line 284 pointer arithmetic: pointer NULL in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.236] line 284 pointer arithmetic: pointer invalid in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.237] line 284 pointer arithmetic: deallocated dynamic object in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.238] line 284 pointer arithmetic: dead object in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.239] line 284 pointer arithmetic: pointer outside object bounds in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.240] line 284 pointer arithmetic: invalid integer address in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.469] line 284 pointer arithmetic: pointer NULL in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.470] line 284 pointer arithmetic: pointer invalid in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.471] line 284 pointer arithmetic: deallocated dynamic object in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.472] line 284 pointer arithmetic: dead object in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.473] line 284 pointer arithmetic: pointer outside object bounds in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.474] line 284 pointer arithmetic: invalid integer address in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.475] line 284 pointer arithmetic: pointer NULL in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.476] line 284 pointer arithmetic: pointer invalid in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.477] line 284 pointer arithmetic: deallocated dynamic object in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.478] line 284 pointer arithmetic: dead object in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.479] line 284 pointer arithmetic: pointer outside object bounds in state + (signed long int)7: SUCCESS
[x_y.pointer_arithmetic.480] line 284 pointer arithmetic: invalid integer address in state + (signed long int)7: SUCCESS
[x_y.pointer_dereference.91] line 284 dereference failure: pointer NULL in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.92] line 284 dereference failure: pointer invalid in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.93] line 284 dereference failure: deallocated dynamic object in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.94] line 284 dereference failure: dead object in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.95] line 284 dereference failure: pointer outside object bounds in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.96] line 284 dereference failure: invalid integer address in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.205] line 284 dereference failure: pointer NULL in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.206] line 284 dereference failure: pointer invalid in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.207] line 284 dereference failure: deallocated dynamic object in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.208] line 284 dereference failure: dead object in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.209] line 284 dereference failure: pointer outside object bounds in state[(signed long int)7]: SUCCESS
[x_y.pointer_dereference.210] line 284 dereference failure: invalid integer address in state[(signed long int)7]: SUCCESS
rod-chapman commented 1 year ago

Actually - that suggests another simplification for "Summarized" mode - when a single line of code generated N messages, and they're all SUCCESS, then just print 1 message saying "line X: ALL SUCCESS" instead of 38 lines in the example above... that would be good.