Unsound verification with /proverOpt:O:smt.arith.solver=6

muchang commented 1 year ago

Dafny version

4.1.0

Code to produce this issue

method main (a: real, b: real, c: real, d: real) {
    if (0.0 <= 0.0) {
        var x := 0.0;
        if (b <= 0.0 * a) {
            x := 7.0;
            x := 1.0 / x;
            if ((1.0 + a * x) * 2.0 == d) {
                x := a * d;
                if (1.0 + x * a == d) {
                    x := c * c;
                    if (x == d && 0.0 <= a) {
                        if (0.0 <= c) {
                            assert false;
                        }
                    }
                }
            }
        }
    }
    assert true;
}

Command to run and resulting output

$ dafny-4-1/dafny/dafny /compile:0 /proverOpt:O:smt.arith.solver=6 main.dfy

Dafny program verifier finished with 1 verified, 0 errors

$ dafny-4-1/dafny/dafny /compile:0 main.dfy
main.dfy(14,35): Error: assertion might not hold

Dafny program verifier finished with 0 verified, 1 error

What happened?

According to the previous discussion (https://github.com/dafny-lang/dafny/issues/3869), using /proverOpt:O:smt.arith.solver=6 performs better than the default setting.

However, in this case, /proverOpt:O:smt.arith.solver=6 produces an unsound result.

Removing the irrelevant code (like deleting assert true or transforming 0.0 <= 0.0 to true) yields the correct result.

What type of operating system are you experiencing the problem on?

Linux

atomb commented 1 year ago

So far, I've reduced this to the following SMT-Lib file that Z3 says is unsat, but I believe shouldn't be.

(set-logic ALL)
(declare-sort |T@U| 0)
(declare-sort |T@T| 0)
(declare-fun Ctor (T@T) Int)
(declare-fun boolType () T@T)
(declare-fun type (T@U) T@T)
(declare-fun real_2_U (Real) T@U)
(declare-fun LitReal (Real) Real)
(declare-fun MapType2Select (T@U T@U T@U) T@U)
(declare-fun MapType0Type (T@T T@T) T@T)
(declare-fun MapType1Type () T@T)
(declare-fun MapType0Store (T@U T@U T@U) T@U)
(declare-fun MapType2Type (T@T T@T) T@T)
(declare-fun MapType2TypeInv1 (T@T) T@T)
(declare-fun MapType2Store (T@U T@U T@U T@U) T@U)

(assert (forall ((x@@11 Real) ) (= (LitReal x@@11) x@@11)))

(assert (and (and (and (and (and (and (and (and
(forall ((arg0@@14 T@T) (arg1 T@T) ) (= (Ctor (MapType0Type arg0@@14 arg1)) 7))
(forall
 ((arg0@@18 T@U) (arg1@@3 T@U) (arg2 T@U))
 (let ((aVar1@@0 (type arg2)))
      (let ((aVar0 (type arg1@@3)))
        (= (type (MapType0Store arg0@@18 arg1@@3 arg2)) (MapType0Type aVar0 aVar1@@0)))))
)))
(forall
 ((arg0@@23 T@T) (arg1@@6 T@T) )
 (= (Ctor (MapType2Type arg0@@23 arg1@@6)) 11))
)
(forall
 ((arg0@@25 T@T) (arg1@@8 T@T) )
 (= (MapType2TypeInv1 (MapType2Type arg0@@25 arg1@@8)) arg1@@8))
)
(forall
 ((arg0@@26 T@U) (arg1@@9 T@U) (arg2@@1 T@U) )
 (let ((aVar1@@2 (MapType2TypeInv1 (type arg0@@26))))
      (= (type (MapType2Select arg0@@26 arg1@@9 arg2@@1)) aVar1@@2)))
)
(forall
 ((arg0@@27 T@U) (arg1@@10 T@U) (arg2@@2 T@U) (arg3 T@U) )
 (let ((aVar1@@3 (type arg3)))
      (let ((aVar0@@0 (type arg1@@10)))
        (= (type (MapType2Store arg0@@27 arg1@@10 arg2@@2 arg3)) (MapType2Type aVar0@@0 aVar1@@3)))))
)
(forall
 ((val@@6 T@U) (m@@6 T@U) (x0@@6 T@U) (x1@@0 T@U) (y0@@3 T@U) (y1 T@U) )
  (or (= x0@@6 y0@@3)
         (= (MapType2Select (MapType2Store m@@6 x0@@6 x1@@0 val@@6) y0@@3 y1) (MapType2Select m@@6 y0@@3 y1))))
))

(declare-fun ControlFlow (Int Int) Int)
(declare-fun |c#0| () Real)
(declare-fun |x#0_0@4| () Real)
(declare-fun |d#0| () Real)
(declare-fun |a#0| () Real)
(declare-fun |x#0_0@3| () Real)
(declare-fun |x#0_0@2| () Real)
(declare-fun |b#0| () Real)
(declare-fun |x#0_0@1| () Real)
(declare-fun |x#0_0@0| () Real)

(assert (not
 (=> (= (ControlFlow 0 0) 21) (let ((anon15_correct  (=> (= (ControlFlow 0 2) (- 0 1)) true)))
(let ((anon22_Else_correct  (=> (and (< |c#0| (LitReal 0.0)) (= (ControlFlow 0 10) 2)) anon15_correct)))
(let ((anon22_Then_correct  (=> (<= (LitReal 0.0) |c#0|) (and (=> (= (ControlFlow 0 8) (- 0 9)) false) (=> false (=> (= (ControlFlow 0 8) 2) anon15_correct))))))
(let ((anon21_Then_correct  (=> (and (= |x#0_0@4| |d#0|) (<= (LitReal 0.0) |a#0|)) (and (=> (= (ControlFlow 0 11) 8) anon22_Then_correct) (=> (= (ControlFlow 0 11) 10) anon22_Else_correct)))))
(let ((anon21_Else_correct  (=> (and (not (and (= |x#0_0@4| |d#0|) (<= (LitReal 0.0) |a#0|))) (= (ControlFlow 0 7) 2)) anon15_correct)))
(let ((anon6_correct  (and (=> (= (ControlFlow 0 12) 11) anon21_Then_correct) (=> (= (ControlFlow 0 12) 7) anon21_Else_correct))))
(let ((anon20_Else_correct  (=> (and (not (= |x#0_0@4| |d#0|)) (= (ControlFlow 0 14) 12)) anon6_correct)))
(let ((anon20_Then_correct  (=> (and (= |x#0_0@4| |d#0|) (= (ControlFlow 0 13) 12)) anon6_correct)))
(let ((anon19_Then_correct  (=> (and (= (+ 1.0 (* |x#0_0@3| |a#0|)) |d#0|) (= |x#0_0@4| (* |c#0| |c#0|))) (and (=> (= (ControlFlow 0 15) 13) anon20_Then_correct) (=> (= (ControlFlow 0 15) 14) anon20_Else_correct)))))
(let ((anon19_Else_correct  (=> (and (not (= (+ 1.0 (* |x#0_0@3| |a#0|)) |d#0|)) (= (ControlFlow 0 6) 2)) anon15_correct)))
(let ((anon18_Then_correct  (=> (and (= (* (+ 1.0 (* |a#0| |x#0_0@2|)) 2.0) |d#0|) (= |x#0_0@3| (* |a#0| |d#0|))) (and (=> (= (ControlFlow 0 16) 15) anon19_Then_correct) (=> (= (ControlFlow 0 16) 6) anon19_Else_correct)))))
(let ((anon18_Else_correct  (=> (and (not (= (* (+ 1.0 (* |a#0| |x#0_0@2|)) 2.0) |d#0|)) (= (ControlFlow 0 5) 2)) anon15_correct)))
(let ((anon17_Then_correct  (=> (and (<= |b#0| (* 0.0 |a#0|)) (= |x#0_0@1| (LitReal 7.0))) (and (=> (= (ControlFlow 0 17) (- 0 18)) (not (= |x#0_0@1| 0.0))) (=> (not (= |x#0_0@1| 0.0)) (=> (= |x#0_0@2| (/ 1.0 |x#0_0@1|)) (and (=> (= (ControlFlow 0 17) 16) anon18_Then_correct) (=> (= (ControlFlow 0 17) 5) anon18_Else_correct))))))))
(let ((anon17_Else_correct  (=> (and (< (* 0.0 |a#0|) |b#0|) (= (ControlFlow 0 4) 2)) anon15_correct)))
(let ((anon16_Then_correct  (=> (and (<= (LitReal 0.0) (LitReal 0.0)) (= |x#0_0@0| (LitReal 0.0))) (and (=> (= (ControlFlow 0 19) 17) anon17_Then_correct) (=> (= (ControlFlow 0 19) 4) anon17_Else_correct)))))
(let ((anon16_Else_correct  (=> (and (< (LitReal 0.0) (LitReal 0.0)) (= (ControlFlow 0 3) 2)) anon15_correct)))
(let ((anon0_correct  (=> true (and (=> (= (ControlFlow 0 20) 19) anon16_Then_correct) (=> (= (ControlFlow 0 20) 3) anon16_Else_correct)))))
(let ((PreconditionGeneratedEntry_correct  (=> (and true (and true (= (ControlFlow 0 21) 20))) anon0_correct)))
PreconditionGeneratedEntry_correct)))))))))))))))))))
))
(check-sat)

Interestingly, If I remove the various quantified axioms about maps at the beginning, which aren't being used by the main asssertion, Z3 instead doesn't return. (This is with Z3 4.12.1.)

muchang commented 1 year ago

Thanks for investigating this! It indeed looks interesting. May I know how to obtain such an SMT-Lib file from the Dafny code? So that I could also investigate further when I encounter such cases.

atomb commented 1 year ago

The /proverLog:file.smt2 option will log the SMT output to a file. In many instances, you'll want to use /proverLog:"@FILE@_@PROC@.smt2", since it'll generate many SMT files. If you're using the newer dafny verify command, the equivalent option is --solver-log.

atomb commented 1 year ago

It seems that if I put everything before the (declare ControlFlow ... line into its own file, Z3 reports unsat, so the axioms Dafny is generating are inconsistent. That explains the problem!

RustanLeino commented 1 year ago

// The name MapType0... in the original was confusing to me, because I think 0 is a sequence number that automatically
// generated as Boogie notices a new map arity. Since among the axioms above, it refers to maps of arity 1, I have
// renamed it to MapType1... in the following.

// The following two axioms are says that type constructors MapType1Type and MapType2Type produce different things
// It's like:
//     datatype T = MapType1Type(arg0: T, result: T) | MapType2Type(arg0: T, result: T)
// What I don't understand is why MapTyp2Type doesn't take three arguments, as in
//     MapType2Type(arg0: T, arg1: T, result: T)

(forall ((arg0 T@T) (arg1 T@T))
  (= (Ctor (MapType1Type arg0 arg1)) 7))
(forall ((arg0 T@T) (arg1 T@T) )
  (= (Ctor (MapType2Type arg0 arg1)) 11))

// In the datatype notation I used above, it is implicit that each constructor is injective in its arguments.
// Here is one of those injectivity axioms, encoding using an inverse function MapType2TypeInv1. This axiom
// says that the type constructor MapType2Type is injective in "arg1".

(forall ((arg0 T@T) (arg1 T@T) )
  (= (MapType2TypeInv1 (MapType2Type arg0 arg1)) arg1))

// The following axioms give the types of MapType1Store and MapType2Store. Essentially, they're declaring
//     function MapType1Store<M, A, B>(arg0: M, arg1: A, arg2: R): MapType1Type<A, R>
//     function MapType2Store<M, A, B, C>(arg0: M, arg1: A, arg2: B, arg3: R): MapType2Type<A, R>
// Note that the type of the map "arg0" does not affect the result of this Store function, which is suspicious.
// There ought to be an antecedent that says "M" is really MapType1Type<A, R>.
// Also, because MapType2Type seems to be missing a parameter, type argument B is missing in the second line.

(forall ((arg0 T@U) (arg1 T@U) (arg2 T@U))
  (= (type (MapType1Store arg0 arg1 arg2)) (MapType1Type (type arg1) (type arg2))))

(forall ((arg0 T@U) (arg1 T@U) (arg2 T@U) (arg3 T@U) )
  (= (type (MapType2Store arg0 arg1 arg2 arg3)) (MapType2Type (type arg1) (type arg3))))

// The following axiom shows the type of select. It says that
//     type(m[x, y]) == type(the result type of m)
// One could imagine that this axiom would have an antecedent that says "arg0" really is a map, but it's possible
// that the lack of such antecedent has does not affect the mathematical consistency of the encoding.

(forall ((arg0 T@U) (arg1 T@U) (arg2 T@U) )
  (= (type (MapType2Select arg0 arg1 arg2)) (MapType2TypeInv1 (type arg0))))

// This is one of the usual select-of-store axiom. In other notation, it says:
//     forall val, m, x0, x1, y0, y1 ::
//         x0 != y0  ==>  m[x0, x1 := val][y0, y1] == m[y0, y1]

(forall ((val T@U) (m T@U) (x0 T@U) (x1 T@U) (y0 T@U) (y1 T@U) )
  (or (= x0 y0)
      (= (MapType2Select (MapType2Store m x0 x1 val) y0 y1) (MapType2Select m y0 y1))))

atomb commented 1 year ago

An interesting additional data point. I generated SMT-Lib from the original problem above with Boogie's predicate, argument, and monomorphic encodings of the problem. All yield unsat from Z3 4.12.1 (distributed with Dafny) but unknown from the recently-released Z3 4.12.2.

However, the reduced problem I posted above yields unsat from both 4.12.1 and 4.12.2.

atomb commented 1 year ago

Another data point: the reduced axiom prefix I posted above only yields unsat if I remove the (set-option :smt.mbqi false) line from the original file. Besides that, I've double-checked that the only modifications I made to the SMT-Lib file that produces unsat, relative to the one produced by Dafny, are to comment out a few of the assertions connected to a) other axioms, and b) the actual problem specified in the Dafny file.

atomb commented 1 year ago

Narrowing this down further, if I take the SMT-Lib files produced by the monomorphic encoding and argument encoding of polymorphic types and comment out the actual verification problem at the end they become easily sat. So I think this issue is not to do with the ideal version of the axioms themselves but rather the way that their polymorphic types are encoded using predicates. This seems quite plausible considering @RustanLeino's comment that certain type antecedents seemed to be missing.

This is perhaps additional incentive to switch to the argument-based (or monomorphized) encoding of Boogie's polymorphic types when used from Dafny. I had already been experimenting with making that switch, because it seems to improve performance. Improving soundness as a side effect would be nice! :)

RustanLeino commented 1 year ago

I have three things to add.

Quantifier modes

In Aaron's reduced SMT file above, there are no ground terms other than the forall expressions. If this input is fed to Z3 and Z3 is asked to treat quantifiers via triggers, then there is nothing for Z3 to do, because there are no ground terms that would match the triggers. Hence, in that mode, Z3 immediately returns (unknown).

To get Z3 to use the trigger mode, one needs to do two things: turn off Z3's auto-config and turn off Z3's model-based quantifier instantiation (smt.mbqi). Boogie does both of these things. So, in order to get an (unsat) answer, the input to Z3 must also some ground terms that trigger the quantifier instantiations that let the solver prove false.

I don't know if the real numbers in @muchang's example provided those ground terms. Or could it be that smt.arith.solver=6 changes the quantifier mode to something like smt.mbqi?

A corrected encoding

I corrected the encoding by hand, and that caused Z3 no longer to output (unsat). There are two parts to the correction.

One part is to change the 2-ary map type from

(declare-fun MapType2Type (T@T T@T) T@T)

to

(declare-fun MapType2Type (T@T T@T T@T) T@T)

The 3 arguments to MapType2Type is now the 2 "index" types of the map and the 1 result type of the map. The fact that this argument was missing before means that all [X, _]R types in Boogie were treated as being of the same type. In other words, the type(_) of such a map was not able to distinguish types based on their second index type. That does not give any inconsistency, but it's not what was intended, so I'd say it's a bug.

The other part is to add an antecedent to the previous

(forall ((arg0 T@U) (arg1 T@U) (arg2 T@U))
  (= (type (MapType1Store arg0 arg1 arg2)) (MapType1Type (type arg1) (type arg2))))

to make it:

(forall ((arg0 T@U) (arg1 T@U) (arg2 T@U))
  (!
    (=>
      (= (type arg0) (MapType1Type (type arg1) (type arg2)))
      (= (type (MapType1Store arg0 arg1 arg2)) (MapType1Type (type arg1) (type arg2))))
    :pattern ( (type (MapType1Store arg0 arg1 arg2)) )
))

I think Z3 will pick the trigger I stated here explicitly, but I'm not certain, so it seems best to add it manually.

Here is the input I then fed to Z3:

(set-logic ALL)
(declare-sort |T@U| 0)
(declare-sort |T@T| 0)
(declare-fun Ctor (T@T) Int)
(declare-fun boolType () T@T)
(declare-fun type (T@U) T@T)

(declare-fun MapType1Type (T@T T@T) T@T)
(declare-fun MapType1Store (T@U T@U T@U) T@U)

(declare-fun MapType2Type (T@T T@T T@T) T@T)
(declare-fun MapType2Select (T@U T@U T@U) T@U)
(declare-fun MapType2Store (T@U T@U T@U T@U) T@U)

(declare-fun MapType2TypeInv1 (T@T) T@T)
(declare-fun MapType2TypeInv2 (T@T) T@T)

(assert (and

(forall ((arg0 T@T) (arg1 T@T))
  (= (Ctor (MapType1Type arg0 arg1)) 7))
(forall ((arg0 T@T) (arg1 T@T) (arg2 T@T))
  (= (Ctor (MapType2Type arg0 arg1 arg2)) 11))

(forall ((arg0 T@T) (arg1 T@T) (arg2 T@T))
  (= (MapType2TypeInv2 (MapType2Type arg0 arg1 arg2)) arg2))

(forall ((arg0 T@U) (arg1 T@U) (arg2 T@U))
  (!
    (=>
      (= (type arg0) (MapType1Type (type arg1) (type arg2)))
      (= (type (MapType1Store arg0 arg1 arg2)) (MapType1Type (type arg1) (type arg2))))
    :pattern ( (type (MapType1Store arg0 arg1 arg2)) )
))

(forall ((arg0 T@U) (arg1 T@U) (arg2 T@U) (arg3 T@U) )
  (= (type (MapType2Store arg0 arg1 arg2 arg3)) (MapType2Type (type arg1) (type arg2) (type arg3))))

(forall ((arg0 T@U) (arg1 T@U) (arg2 T@U) )
  (= (type (MapType2Select arg0 arg1 arg2)) (MapType2TypeInv2 (type arg0))))

(forall ((val T@U) (m T@U) (x0 T@U) (x1 T@U) (y0 T@U) (y1 T@U) )
  (or (= x0 y0)
      (= (MapType2Select (MapType2Store m x0 x1 val) y0 y1) (MapType2Select m y0 y1))))

))

(check-sat)

The inconsistency

I didn't check all the details with the encoding above, here I think the proof of false goes like this:

Suppose we're given variables w, x, y, z of types X, X, Y, Z. That is, we have (using Dafny-like syntax):

type(w) == type(x) == X
type(y) == Y
type(z) == Z

Note that in the SMT Lib input, w, x, y, z all have type T@U and X, Y, Z have type T@T.

Suppose further that we have a variable m (of type T@U). Then, according to the antecedent-less axiom, we have that (here, I'll use Boogie-like notation) m[x := y] has type [X]Y and m[x := z] has type [X]Z.

If we select from an [X]Y map, we get a Y, and if we select from an [X]Z map, we get a Z. So says the axioms (you need both the select...inv... axiom and the other axiom above ...inv....

Thus, we have that type( m[x := y] ) is equal to Y and equal to Z. That is, Y equals Z.

As the final blow, let's now pick Y and Z to be (MapType1Type ...) and (MapType2Type ...), respectively. The Ctor axioms say that these two are different.

The two previous paragraphs give the contradiction.

atomb commented 1 year ago

Thanks for digging into the problem more deeply, @RustanLeino! I have indeed found that MBQI is significant in this problem. For the smallest reproducing example I can find, MBQI needs to be enabled for Z3 to report unsat. The same is true for CVC5. Vampire reports unsat with its default settings, which is less surprising, since it's not depending on triggers.

The Boogie source for my smaller example is in a Boogie issue here.

atomb commented 1 year ago

The missing antecedents to function axioms are fixed in https://github.com/boogie-org/boogie/pull/749. I thought that would fix this issue, since the smaller Boogie example I extracted from this that was giving unsound results is fixed by that PR. However, the example above is still verified by Dafny (when using smt.arith.solver=6).

Just to double-check the result we should expect, I translated the Dafny example above to SMT-Lib by hand (without any of the extraneous stuff Dafny includes), yielding the following.

(declare-fun a () Real)
(declare-fun b () Real)
(declare-fun c () Real)
(declare-fun d () Real)
(assert (<= 0.0 0.0))
(assert (<= b (* 0.0 a)))
(assert (= (* (+ 1.0 (* a (/ 1.0 7.0))) 2.0) d))
(assert (= (+ 1.0 (* (* a d) a)) d))
(assert (= (* c c) d))
(assert (<= 0.0 a))
(assert (<= 0.0 c))
(check-sat)
(get-model)

I get sat back from Z3 (with or without smt.arith.solver=6) for this, but Dafny still manages to verify the Dafny version (only with smt.arith.solver=6).

atomb commented 11 months ago

I increasingly think the remaining unsoundness is a bug in Z3 which has since been fixed. I've taken the SMT-Lib output from this example and begun removing pieces of it such that it still results in unsat from Z3 4.12.1. It seems that the setting of the arithmetic solver isn't actually the key option. With 4.12.1, it needs at least the following:

(set-option :auto_config false)
(set-option :smt.case_split 3)
A (push 1) and (pop) pair around the query

Removing any of those results in Z4 4.12.1 reporting unknown or diverging. With those options still in place, several old versions of Z3 also report unsat, and several report unknown or diverge. Most interestingly, 4.8.5 (the previous default) and 4.12.4 (the most recent) both report unknown.

Looking through the Z3 commit history, there do seem to be real arithmetic fixes that are in any release >= 4.12.2.

At this point I think the right move to fix this issue is to update Dafny to use Z3 4.12.4 by default. It also includes some changes that are likely to improve performance, when enabled.

atomb commented 9 months ago

With the latest Dafny (728433a99a as I write this), this issue no longer occurs. However, there is almost certainly a soundness bug in 4.12.1, as described above, which is no longer present in newer versions. In addition to that, though, there's also an open soundness bug in Z3 that also affects the NRA theory.

My inclination is to leave this issue open until a Z3 release comes out that has resolved that issue and we've adopted that version. My reasoning is that, although the code snippet above doesn't trigger the issue, we know there is one in Z3, and therefore it may be possible to come to an unsound result through some other path.

dafny-lang / dafny