Z3Prover / z3

The Z3 Theorem Prover
Other
10.17k stars 1.47k forks source link

non-determinism in Z3 4.8.8 #4559

Closed rospoly closed 4 years ago

rospoly commented 4 years ago

I am experiencing a non-deterministic behavior with Z3 4.8.8 running on Ubuntu 16.04 LTS. This brittle behavior manifests with two programs that are nearly identical, except for the input parameter of a function call. I attach the two programs (output1.txt and output2.txt).

In particular, the function $DebugTrackLocal is invoked with input '3' in output2.txt, and with input '4' in output1.txt. The differing inputs to $DebugTrackLocal consist in the id number of a source file; these inputs have no bearing on the satisfiability of the query and are merely used for converting the returned model into a counterexample trace at the source level.

The execution time is 5s for output1.smt2 and about 2m for output2.smt2.

I tried to detect the reason causing such brittle behavior by profiling both verifications (smt.qi.profile=true, and then qprofdiff). There is a massive difference in the number of quantifiers being instantiated by Z3 on the two files (I report the execution of qprofdiff in diff.log).

I report here the options given to Z3: (set-option :print-success false) (set-info :smt-lib-version 2.6) (set-option :produce-models true) (set-option :smt.QI.EAGER_THRESHOLD 100) (set-option :smt.QI.LAZY_THRESHOLD 100) (set-option :smt.array.extensional false) (set-option :smt.mbqi false) (set-option :model.compact false) (set-option :model.v2 true) (set-option :pp.bv_literals false)

Why this phenomenon is happening? Is there some additional debugging I can do to profiling the two executions?

Thanks.

output2.txt output1.txt diff.txt

@shazqadeer @wrwg @DavidLDill

NikolajBjorner commented 4 years ago

Try running with different random seeds, that is use smt.random_seed=NN from the command-line. You should see that for the same input, z3 exhibits different runtimes. It is a phenomenon shared across SAT solvers, quantifier-free SMT, and quantified SMT: ensuring performance stability for finding satisfying solutions isn't easy. You can work around such instabilities by spawning independent versions of the solver with different random seeds and use a first-solver wins.

wrwg commented 4 years ago

Thanks Nikolaj this is really something which we should try (ideally this would be build into Boogie which we are currently using on top of Z3).

We still would love to better understand the problem, because we have many instances of this surprising butterfly effect. We have been carefully reviewing our quantifiers and providing them with triggers; as far as we can tell they are harmless (function definitions, lambdas, simple injected predicates; recursion resolved by stratification).

My concern is that given the well-known behavior of non-determinism of SMT solvers, the chance is also high that actual bugs e.g. in QI can stay undiscovered for long. Any suggestion how we can help to hunt such bugs? Or does it appear unlikely that they exist?

rospoly commented 4 years ago

Hi Nikolaj, thank you for your precious support.

I did the experiment you mentioned, meaning spanning "a lot" of independent instances of Z3 each one with a distinct random seed. I attach the plot showing the results for output2.smt2.

time In the experiment, we run 1001 independent instances of Z3 with a (soft) timeout of 5mins for each VC.

My concern here is that we need a lot of independent instances of Z3 to actually catch a fast one. It might be computationally unfeasible to integrate this approach in Boogie.

NikolajBjorner commented 4 years ago

You could also try to play with the following smt parameters:

smt.threads (unsigned int) maximal number of parallel threads. (default: 1) smt.threads.max_conflicts (unsigned int) maximal number of conflicts between rounds of cubing for parallel SMT (default: 400)

Cubing seems perhaps not to be too useful in your case so perhaps set smt.threads.max_conflicts to a very large number.

What this parameter does is to

  1. First invoke the smt core with a a few conflicts (40).
  2. Then spawn smt.threads number of concurrent solvers with a limit of smt.threads.max_conflicts.
  3. When the threads have reached their limits without a verdict, they perform independent cubing (a case split) to specialize the goal and split the search space between threads. If a thread in this mode produces a conflict that depends on the case split, the conflict is added as a unit among all workers.

By setting smt.threads.max_conflicts to a large number you avoid the loop in step 3. That loop has some drawbacks as well: it could be that the solver learns units that are useless and expensive to share (especially from Gomory cuts), it also has to work in a mode where it integrates assumptions. This precludes some global simplification capabilities.

It isn't clear that this strategy is going to help you, but it is there as an option.

Regarding "non-determinism" and MBQI debugging: The core solver isn't non-deterministic so when there is an issue it tends to show up, and so consistently, over larger sets of inputs (benchmark sets).