Surprising order-dependence with ghost state

Consider this example, courtesy of @qsctr:

#include <stdint.h>
#include <stdlib.h>

int has(uint32_t x) {
    return 0;
}

// precondition: has(x)
void get(uint32_t x) {}

// precondition: has(42)
void get2() {
  uint32_t x = 42;
  if (has(x)) {
    get(x);
  } else {
    abort();
  }
}

m <- llvm_load_module "test.bc";

ghost <- llvm_declare_ghost_state "ghost";

let has_spec = do {

  g <- llvm_fresh_var "has::ghost" (llvm_int 32);
  llvm_ghost_value ghost g;

  x <- llvm_fresh_var "has::x" (llvm_int 32);

  llvm_execute_func [llvm_term x];

  llvm_ghost_value ghost g;

  llvm_return (llvm_term {{ if x == g then 1 else 0 : [32] }});

};

let get_spec = do {

  g <- llvm_fresh_var "get::ghost" (llvm_int 32);
  x <- llvm_fresh_var "get::x" (llvm_int 32);

  llvm_ghost_value ghost g;
  llvm_precond {{ x == g }};

  llvm_execute_func [llvm_term x];

  llvm_ghost_value ghost g;

};

let get2_spec = do {

  g <- llvm_fresh_var "get2::ghost" (llvm_int 32);
  llvm_ghost_value ghost g;

  llvm_precond {{ g == 42 }};

  llvm_execute_func [];

  llvm_ghost_value ghost g;

};

has_ov <- llvm_unsafe_assume_spec m "has" has_spec;
get_ov <- llvm_unsafe_assume_spec m "get" get_spec;

llvm_verify m "get2" [has_ov, get_ov] true get2_spec z3;

This looks plausible, but if you run it, SAW will fail to verify get2_spec:

$ ~/Software/saw-1.0/bin/saw test.saw

[21:55:55.718] Loading file "/home/ryanscott/Documents/Hacking/SAW/test.saw"
[21:55:55.743] Assume override has
[21:55:55.770] Assume override get
[21:55:55.813] Verifying get2 ...
[21:55:55.831] Simulating get2 ...
[21:55:55.832] Registering overrides for `get`
[21:55:55.832]   variant `Symbol "get"`
[21:55:55.832] Registering overrides for `has`
[21:55:55.832]   variant `Symbol "has"`
[21:55:55.833] Matching 1 overrides of  has ...
[21:55:55.833] Branching on 1 override variants of has ...
[21:55:55.834] Applied override! has
[21:55:55.836] Matching 1 overrides of  get ...
[21:55:55.837] Branching on 1 override variants of get ...
[21:55:55.839] Applied override! get
[21:55:55.839] Symbolic simulation completed with side conditions.
[21:55:55.839] Checking proof obligations get2 ...
[21:55:55.866] Subgoal failed: get2 test.c:15:5: error: in overrideBranches
No override specification applies for get.
Arguments:
- literal integer: unsigned value = 42, signed value =  42, width = 32
Run SAW with --sim-verbose=3 to see a description of each override.
[21:55:55.866] SolverStats {solverStatsSolvers = fromList ["SBV->Z3"], solverStatsGoalSize = 305}
[21:55:55.866] ----------Counterexample----------
[21:55:55.866]   get::ghost: 4294967253
[21:55:55.866]   get2::ghost: 42
[21:55:55.866] ----------------------------------
[21:55:55.867] Stack trace:
"llvm_verify" (/home/ryanscott/Documents/Hacking/SAW/test.saw:50:1-50:12)
Proof failed.

This counterexample is bewildering. How can get::ghost be 4294967253 when get's preconditions require get::ghost to be 42? It gets even weirder when you realize that if you make this small change to get_spec:

let get_spec = do {

  g <- llvm_fresh_var "get::ghost" (llvm_int 32);
  x <- llvm_fresh_var "get::x" (llvm_int 32);

  // Note that `llvm_precond` comes BEFORE `llvm_ghost_value` in this version!
  llvm_precond {{ x == g }};
  llvm_ghost_value ghost g;

  llvm_execute_func [llvm_term x];

  llvm_ghost_value ghost g;

};

Then verification succeeds! This flies in the face of usual SAW conventions, where the order of preconditions should not matter.

The reason that this happens is somewhat technical. Currently, SAW happens to process each precondition in a spec from top to bottom. Processing a precondition like llvm_precond {{ x == g }} extends a variable substitution that SAW maintains such that g will map to x. Therefore, if you process the llvm_ghost_value ghost g afterwards, then SAW will map g to x, making it as though you had written llvm_ghost_value ghost x. SAW knows how to link this back to the argument to the get function, and all is well.

If the two preconditions are written in the opposite order, however, then when SAW processes llvm_ghost_value ghost g first, it has no idea that g is related to x in any way. As such, it will allow g to be any value whatsoever, and indeed, symbolic execution will happily pick 4294967253 as its value. Bad bad bad!

Really, we should always run llvm_ghost_value preconditions after all other preconditions, as running the other preconditions may introduce important variable equalities that are essential to making compositional overrides (such as get) apply successfully. This special-casing of llvm_ghost_value makes sense if you think of ghost values as a type of argument to the function, and indeed, we already special-case the handling of ordinary function arguments (via llvm_execute_func).

Although the example above uses the LLVM backend, there is nothing inherently LLVM-specific about it, and the bug will also apply to the JVM and MIR backends after #1958 lands.

This issue can also be observed with postconditions, not just preconditions:

// test.c
#include "stdint.h"

void foo (uint8_t x) {}

void bar (uint8_t x) {
    foo(x);
}

// test.saw
m <- llvm_load_module "test.bc";

let spec_ghost ghost = do {
    x <- llvm_fresh_var "x" (llvm_int 8);

    llvm_execute_func [llvm_term x];

    ghost_post <- llvm_fresh_var "ghost_post" (llvm_int 8);
    llvm_postcond {{ghost_post == x}};
    llvm_ghost_value ghost ghost_post;
};

ghost <- declare_ghost_state "ghost";
ghost_ov <- llvm_unsafe_assume_spec m "foo" (spec_ghost ghost);
ghost_res <- llvm_verify m "bar" [ghost_ov] true (spec_ghost ghost) z3;

This succeeds. If you swap the order of the llvm_postcond and llvm_ghost_value statements, however:

m <- llvm_load_module "test.bc";

let spec_ghost ghost = do {
    x <- llvm_fresh_var "x" (llvm_int 8);

    llvm_execute_func [llvm_term x];

    ghost_post <- llvm_fresh_var "ghost_post" (llvm_int 8);
    llvm_ghost_value ghost ghost_post;
    llvm_postcond {{ghost_post == x}};
};

ghost <- declare_ghost_state "ghost";
ghost_ov <- llvm_unsafe_assume_spec m "foo" (spec_ghost ghost);
ghost_res <- llvm_verify m "bar" [ghost_ov] true (spec_ghost ghost) z3;

Then verification fails:

$ ~/Software/saw-1.1/bin/saw test.saw
[16:35:22.855] Loading file "/home/ryanscott/Documents/Hacking/SAW/test.saw"
[16:35:22.882] Assume override foo
[16:35:22.920] Verifying bar ...
[16:35:22.932] Simulating bar ...
[16:35:22.933] Registering overrides for `foo`
[16:35:22.933]   variant `Symbol "foo"`
[16:35:22.934] Matching 1 overrides of  foo ...
[16:35:22.934] Branching on 1 override variants of foo ...
[16:35:22.935] Applied override! foo
[16:35:22.935] Checking proof obligations bar ...
[16:35:22.952] Subgoal failed: bar /home/ryanscott/Documents/Hacking/SAW/test.saw:16:14: error: in llvm_postcond
postcondition
[16:35:22.952] SolverStats {solverStatsSolvers = fromList ["SBV->Z3"], solverStatsGoalSize = 223}
[16:35:22.953] ----------Counterexample----------
[16:35:22.953]   x: 1
[16:35:22.953]   ghost_post: 254
[16:35:22.953]   ghost_post: 1
[16:35:22.953] ----------------------------------
[16:35:22.953] Stack trace:
"llvm_verify" (/home/ryanscott/Documents/Hacking/SAW/test.saw:16:14-16:25)
Proof failed.

This suggests a similar fix: always run llvm_ghost_value postconditions after all other postconditions.

GaloisInc / saw-script

Surprising order-dependence with ghost state #1984