Function names influence the result

usr-sse2 commented 3 years ago

Z3 4.8.8 is run using the following command: ./z3 rlimit=5000000 tactic.solve_eqs.context_solve=false req.z3 and this unsatisfiable request produces result unknown:

(set-logic QF_UFBV)
(declare-fun conv_13_0 () (_ BitVec 32))
(declare-fun expr_12_0 () (_ BitVec 64))
(declare-fun m_2_0 () (_ BitVec 64))
(declare-fun expr_4_0 () Bool)
(declare-fun expr_8_0 () Bool)
(declare-fun expr_6_0 () (_ BitVec 64))
(declare-fun expr_10_0 () (_ BitVec 64))
(declare-fun Let0 () (_ BitVec 64))
(declare-fun Let3 () (_ BitVec 64))
(define-fun Complex1 () Bool (and (not expr_4_0) (not expr_8_0))) 
(define-fun Complex2 () Bool (and expr_4_0 (not expr_8_0))) 
(assert (or (and Complex1 (= Let0 m_2_0)) (and Complex2 (= Let0 expr_6_0)) (and expr_8_0 (= Let0 expr_10_0))))
(assert (or (and (not expr_4_0) (= Let3 m_2_0)) (and expr_4_0 (= Let3 expr_6_0))))
(assert (! (= conv_13_0 (_ bv18 32)) :named a0))
(assert (! (= conv_13_0 ((_ extract 31 0) expr_12_0)) :named a1))
(assert (! (= expr_12_0 (bvsrem Let0 (_ bv10 64))) :named a2))
(assert (! (= expr_6_0 (bvsrem m_2_0 (_ bv1000 64))) :named a3))
(assert (! (= expr_10_0 (bvsrem Let3 (_ bv100 64))) :named a4))
(assert (! (= expr_4_0 (bvsge m_2_0 (_ bv1000 64))) :named a5))
(assert (! (= expr_8_0 (bvsge Let3 (_ bv100 64))) :named a6))
(check-sat)

The minimum sufficient rlimit value for this request is 5283401.

However, the result changes to unsat if m_2_0 is renamed to m_2. Why can variable names influence the resource number required to solve a request?
If bv18 is replaced with bv19, the formula is still unsatisfiable for the same reason (the remainder of division by 10 can't be ≥10), but Z3 is able to solve it within the 5000000 limit even without renaming.

usr-sse2 commented 3 years ago

Possible duplicate of #4604, #4600, #930

NikolajBjorner commented 3 years ago

variable names influence search behavior, so does sat.random_seed. Try just running with sat.random_seed=1, ..., 10 and with -st set. Then you will see that z3 uses different number of restarts, decisions, etc.

A common mistake in benchmarking SAT/SMT/ATP solvers is to assume that it suffices to test a benchmark with a particular setting. The real measure is whether the solver is stable over a range of seeds/scrambles of inputs. So you should not only perform more than one run to average out overhead from the OS, but also scramble and seed search to understand whether the solver behaves in a stable way.

For a fixed input and fixed random seed, Z3 should behave deterministically in common use cases (exceptions are when Z3 tries some time-bounded strategies to solve arithmetical formulas and when you use parallel solving). Only OS overhead should influence wall-clock time.

Z3Prover / z3

Function names influence the result #5147