diffblue / cbmc

C Bounded Model Checker
https://diffblue.github.io/cbmc
Other
848 stars 263 forks source link

What are the semantics of (repeated) declaration and DEAD for a single symbol? #8258

Open tautschnig opened 7 months ago

tautschnig commented 7 months ago

Consider the GOTO program produced from the following C input (this is about GOTO program semantics, not butchering C semantics), which fails to verify for any unwinding value greater or equal to 2:

__CPROVER_bool nondet_bool();

int main()
{
  for(int i = 0; i <= 1 || nondet_bool(); ++i)
  {
    int x = i;
    if(i > 0 && nondet_bool())
      break;
  }
  int escaped = __CPROVER_ID "main::1::1::1::x";
  __CPROVER_assert(escaped > 0, "should hold");
}

Running CBMC yields:

$ cbmc dead2.c --unwind 2 --no-standard-checks --verbosity 4
**** WARNING: Use --unwinding-assertions to obtain sound verification results

** Results:
dead2.c function main
[main.assertion.1] line 12 should hold: FAILURE
VERIFICATION FAILED

While this verification result is consistent for C programs, it should be possible to create a GOTO program in some different way such that main::1::1::1::x is not marked "DEAD" when executing break: GOTO programs don't have a concept of code blocks restricting scope other than as mandated by declarations and DEAD statements. With such a change, the verification failure still persists, but is now surprising for the GOTO program (as shown below for reference, minus instruction 17) will not have killed main::1::1::1::x when assigning to escaped. (The reason is that goto-symex does not make any attempt to merge L1 instances of the same symbol at phi nodes.) Before trying to hack a fix into goto-symex, I'd like to seek consensus on the following questions:

GOTO program for reference:

main /* main */
        // 0 file dead2.c line 5 function main
        DECL main::1::1::i : signedbv[32]
        // 1 file dead2.c line 5 function main
        ASSIGN main::1::1::i := 0
        // 2 file dead2.c line 5 function main
     1: DECL main::$tmp::tmp_if_expr : 𝔹
        // 3 file dead2.c line 5 function main
        IF ¬(main::1::1::i ≤ 1) THEN GOTO 2
        // 4 file dead2.c line 5 function main
        ASSIGN main::$tmp::tmp_if_expr := true
        // 5 no location
        GOTO 3
        // 6 file dead2.c line 5 function main
     2: DECL main::$tmp::return_value_nondet_bool : 𝔹
        // 7 file dead2.c line 5 function main
        ASSIGN main::$tmp::return_value_nondet_bool := side_effect statement="nondet" #identifier="nondet_bool" is_nondet_nullable="1"
        // 8 file dead2.c line 5 function main
        ASSIGN main::$tmp::tmp_if_expr := (main::$tmp::return_value_nondet_bool ? true : false)
        // 9 file dead2.c line 5 function main
     3: IF ¬main::$tmp::tmp_if_expr THEN GOTO 5
        // 10 file dead2.c line 7 function main
        DECL main::1::1::1::x : signedbv[32]
        // 11 file dead2.c line 7 function main
        ASSIGN main::1::1::1::x := main::1::1::i
        // 12 file dead2.c line 8 function main
        IF ¬(main::1::1::i > 0) THEN GOTO 4
        // 13 file dead2.c line 8 function main
        DECL main::$tmp::return_value_nondet_bool$0 : 𝔹
        // 14 file dead2.c line 8 function main
        ASSIGN main::$tmp::return_value_nondet_bool$0 := side_effect statement="nondet" #identifier="nondet_bool" is_nondet_nullable="1"
        // 15 file dead2.c line 8 function main
        IF ¬main::$tmp::return_value_nondet_bool$0 THEN GOTO 4
        // 16 file dead2.c line 9 function main
        DEAD main::$tmp::return_value_nondet_bool$0
        // 17 file dead2.c line 9 function main
        DEAD main::1::1::1::x
        // 18 file dead2.c line 9 function main
        GOTO 5
        // 19 file dead2.c line 10 function main
     4: DEAD main::$tmp::return_value_nondet_bool$0
        // 20 file dead2.c line 10 function main
        DEAD main::1::1::1::x
        // 21 file dead2.c line 5 function main
        ASSIGN main::1::1::i := main::1::1::i + 1
        // 22 file dead2.c line 5 function main
        GOTO 1
        // 23 file dead2.c line 10 function main
     5: DEAD main::$tmp::tmp_if_expr
        // 24 file dead2.c line 10 function main
        DEAD main::$tmp::return_value_nondet_bool
        // 25 file dead2.c line 10 function main
        DEAD main::1::1::i
        // 26 file dead2.c line 11 function main
        DECL main::1::escaped : signedbv[32]
        // 27 file dead2.c line 11 function main
        ASSIGN main::1::escaped := main::1::1::1::x
        // 28 file dead2.c line 12 function main
        ASSERT main::1::escaped > 0 // should hold
        // 29 file dead2.c line 13 function main
        DEAD main::1::escaped
        // 30 file dead2.c line 13 function main
        SET RETURN VALUE side_effect statement="nondet" is_nondet_nullable="1"
        // 31 file dead2.c line 13 function main
        END_FUNCTION
tautschnig commented 7 months ago

As a further example (and in an attempt to see what answers to the above questions would actually work out), consider the following program:

__CPROVER_bool nondet_bool();

int main()
{
  int *p = (int *)0;
  while(nondet_bool())
  {
    int x;
    if(p != 0)
      *p = 1; // should be disallowed
    p = &x;
    *p = 42; // should be permitted
    if(nondet_bool())
      break;
  }
  if(p != 0)
  {
    int t = *p; // should be disallowed
  }
}

With this example, I do believe it is necessary to create fresh objects (for symbolic execution: fresh indices alone would not suffice) for each loop iteration.

martin-cs commented 6 months ago

My mental model of the execution of a GOTO program is that there is an environment map of variable names -> value. DECL adds an entry to this map, with it's initial value being non-deterministic. DEAD removes the entry from the map entirely. Before any ASSIGN we much check that there is an entry in the map for the value. Any evaluation including the RHS of ASSIGN, ASSERT, ASSUME and FUNCTION_CALL must check that the variable is present before getting the value. The tricky case is what happens when you dereference a pointer and one of it's possible values is no longer in the environment map, for example:

int x;
int p = &x;
while (nondet()) {
  int y;
  if (nondet()) {
    p = &y;
  }
}
*p = 0;

The most liberal interpretation is to return non-det / top on read for any missing values and ignore writes to dead things. However this might be too generous and it might be better to give an error.

To use this informal semantics to try to answer your questions...

When is a fresh instance of some named symbol created, when is that not the case?

DECL creates new instances. The only complication is recursion. This is a bit fiddly because you sort of want the environment map to be stack allocated, but what happens if you reference something before you DECL, does that reference the caller's version? Also, what happens if you fail to DEAD a local variable.

Does a declaration merely set an instance of an object to a non-deterministic value?

No, it adds it to the environment, making it valid to reference it.

Does a (re-)declaration reset the value of __CPROVER_dead_object, or does this need to be left to an explicit GOTO program instruction?

Awkward. Ideally I would want __CPROVER_dead_object to be modelling and thus independent of the semantics.

Does a DEAD statement equally set an instance of an object to a non-deterministic value?

No, it means that it is no longer valid to even speak its name!

Does a DEAD statement alone non-deterministically update the value stored in __CPROVER_dead_object, or should that be done via a GOTO program statement?

Again, I would want this to be independent of the semantics, if possible.

Is a repeated DEAD statement (for the same symbol) harmful?

I think it shouldn't happen. It can be safely ignored if the implementation is preferring robustness.

Maybe we really do need to do the long-proposed formalisation of the GOTO language...