Z3Prover / z3

The Z3 Theorem Prover
Other
10.38k stars 1.48k forks source link

spacer and pdr report incomplete/spurious models on HORN problems #1241

Open AdrienChampion opened 7 years ago

AdrienChampion commented 7 years ago

Running spacer or pdr on the SMT file below, I get the following output

sat
(model
  (define-fun |g$unknown:5| ((x!0 Int) (x!1 Int)) Bool
    true)
  (define-fun |f$unknown:3| ((x!0 Int) (x!1 Int) (x!2 Int)) Bool
    true)
)

Four predicates are missing though. I assume this means the predicates do not impact the satisfiability, like missing variables in a LIA model for instance, since that's what duality seems to do:

sat
(model
  (define-fun @Fail!1 () Bool
    false)
  (define-fun |f$unknown:1| ((x!0 Int)) Bool
    false)
  (define-fun |f$unknown:2| ((x!0 Int) (x!1 Int)) Bool
    false)
  (define-fun |g$unknown:4| ((x!0 Int)) Bool
    false)
)

Duality does not define |g$unknown:5| and |f$unknown:3|, which is actually ok.

But the model reported by spacer and pdr does not work.

I'm using the latest z3, commit 059bad909ad4b5f0b70378970475149822cccaa6.

SMT file

(set-logic HORN)

(declare-fun |g$unknown:5|
  ( Int Int ) Bool
)

(declare-fun |f$unknown:3|
  ( Int Int Int ) Bool
)

(declare-fun |f$unknown:2|
  ( Int Int ) Bool
)

(declare-fun |f$unknown:1|
  ( Int ) Bool
)

(declare-fun |g$unknown:4|
  ( Int ) Bool
)

(assert
  (forall ( (|$alpha-3:x| Int) )
    (=>
      ( and (|g$unknown:4| |$alpha-3:x|) )
      (|f$unknown:1| |$alpha-3:x|)
    )
  )
)
(assert
  (forall ( (|$alpha-3:x| Int) )
    (=>
      ( and (|g$unknown:4| |$alpha-3:x|) )
      (|f$unknown:2| |$alpha-3:x| |$alpha-3:x|)
    )
  )
)
(assert
  (forall ( (|$V-reftype:13| Int) (|$alpha-1:x| Int) (|$alpha-2:y| Int) (|$knormal:4| Bool) (|$knormal:1| Bool) (|$knormal:2| Bool) (|$knormal:3| Bool) )
    (=>
      ( and (= |$knormal:4| (not |$knormal:3|)) (= |$knormal:3| (and |$knormal:1| |$knormal:2|)) (= |$knormal:2| (<= |$alpha-2:y| 0)) (= |$knormal:1| (> |$alpha-1:x| 0)) (= |$V-reftype:13| 1) |$knormal:4| (|f$unknown:2| |$alpha-2:y| |$alpha-1:x|) (|f$unknown:1| |$alpha-1:x|) )
      (|f$unknown:3| |$V-reftype:13| |$alpha-2:y| |$alpha-1:x|)
    )
  )
)
(assert
  (not (exists ( (|$knormal:3| Bool) (|$knormal:2| Bool) (|$knormal:1| Bool) (|$alpha-1:x| Int) (|$alpha-2:y| Int) (|$knormal:4| Bool) )
    ( and (= |$knormal:4| (not |$knormal:3|)) (= |$knormal:3| (and |$knormal:1| |$knormal:2|)) (= |$knormal:2| (<= |$alpha-2:y| 0)) (= |$knormal:1| (> |$alpha-1:x| 0)) (not |$knormal:4|) (|f$unknown:2| |$alpha-2:y| |$alpha-1:x|) (|f$unknown:1| |$alpha-1:x|) )
    )
  )
)
(assert
  (forall ( (|$alpha-3:x| Int) (|$knormal:7| Int) (|$V-reftype:14| Int) )
    (=>
      ( and (= |$V-reftype:14| |$knormal:7|) (|g$unknown:4| |$alpha-3:x|) (|f$unknown:3| |$knormal:7| |$alpha-3:x| |$alpha-3:x|) )
      (|g$unknown:5| |$V-reftype:14| |$alpha-3:x|)
    )
  )
)
(check-sat)

(get-model)
agurfinkel commented 7 years ago

The instance is discharged by the pre-processor before getting to the engine. Would it be possible for you to minimize the example? In particular, g$unknown:5 and f$unknown:3 are unused. They are set to true by pre-processor when it removes the rules for them. The rest is discharged by pre-processor.

Model construction is a little weak when pre-processing is enabled. Missing definitions might simply mean that the model for them was not computed. It does not mean that they were not important for satisfiability. On the other hand, a predicate set to true means it was not important.

AdrienChampion commented 7 years ago

The smallest example I managed to get is

(set-logic HORN)

(declare-fun should_be_false ( Int ) Bool)

(assert
  (forall ( (v_1 Int) ) (=> (should_be_false v_1) false ))
)
(assert
  (not (exists ( (v_1 Int) ) (should_be_false v_1)))
)

(check-sat)
(get-model)

Removing either of the asserts causes spacer and pdr to return unknown. Well, I guess it is actually the pre-processor that returns unknown.

Duality works fine on the example above, but its model contains an extra predicate @Fail!1. In my case it is not really a problem, but it sounds like an internal predicate that should not be reported. Should I create a separate issue for this?

On the other hand, a predicate set to true means it was not important.

I find this convention a bit surprising, as a user how do I know whether a predicate defined to true in the model needs to be true or doesn't matter? Wouldn't the convention followed by duality and other theories (at least the arithmetic ones) of not reporting irrelevant elements of the model be more consistent and less ambiguous?

agurfinkel commented 7 years ago

I find this convention a bit surprising, as a user how do I know whether a predicate defined to true in the model needs to be true or doesn't matter?

Sorry, I was wrong. The predicates that are assigned values are important. Predicates that are missing are often a bug in model conversion of the pre-processor. For me, these bugs are low priority since they mostly appear on trivial instances only. However, I can be convinced to look into this closer if this is causing a serious problem for your application 😄

As for predicates set to true. Sometimes, it means that there are no constraints on the predicate and the model can be strengthened. However, this depends on the constraints and cannot be inferred solely from the model.

AdrienChampion commented 7 years ago

For me, these bugs are low priority since they mostly appear on trivial instances only.

I understand, but among our benchmarks, this one in particular is pretty difficult to solve for all the tools we have tried but z3 (pdr/spacer). We wanted to try to understand why by looking at the model but, according to my experiments, it is incorrect.

The bug also seems to be triggered by about 90 of the sv-comp LIA benchmarks, some are small but some are rather big and z3's (incorrect) model for them are non-trivial. This is the case for append_ret_unsafe.c.smt2 for instance.

In any case, is there any options we can use that would force z3 to bypass the problem so that we get the actual models?

NikolajBjorner commented 7 years ago

Sorry, this is taking me some time to get to. Bottom-up dataflow analysis deems that some of the predicates are unreachable and can be set to false (e.g, only occurs in negative polarity or inductively rely on predicates that rely in negative polarity). The transformation omitted recording these predicates and this should now be taken care of.

AdrienChampion commented 7 years ago

The problem seems to be fixed, thanks!

There seems to be another problem though, I get segfaults with Spacer for the following scripts:

I also get segfaults with PDR on a lot of my benchmarks, for instance on this one.

It's probably a different problem, do you want me to open a new issue?

AdrienChampion commented 7 years ago

When I first noticed this problem I did a quick dichotomy on the commits and it seems that the problem started appearing after 40dfdb6606045240f519a42042bd29d6143fa6bd. Is it possible that 5b6472f022594dc64336da33d013d9b43e76ab0a is responsible for these segfaults?

NikolajBjorner commented 7 years ago

The commits seem unrelated. The more likely reason is a merge with @nunoplopes fixes to the vector class. Seems that passing in a constructed value into a resize is not liked. The other crashes are due to an independent recent change in how I track proof objects. Fix on its way.

nunoplopes commented 7 years ago

@NikolajBjorner I can track down the resize problem. What's the test case that exhibits the bug there?

NikolajBjorner commented 7 years ago

Nuno, I fixed the place that crashed. Adr ien's example was https://github.com/hopv/benchmarks/blob/master/clauses/lia/mochi/mc91.smt2

NikolajBjorner commented 7 years ago

Now also the Spacer related segfaults (I tried three from the list and they seemed about the same issue) have been dealt with.

NikolajBjorner commented 7 years ago

can we close this?

AdrienChampion commented 7 years ago

It seems there's still a problem. On this benchmark for instance, pdr and spacer fail to yield a definition for |f$unknown:1|. It should be true as a trivial consequence of the first clause, so I assume it's a preprocessing problem again.

NikolajBjorner commented 7 years ago

Yes, also preprocessing, but a different module.

AdrienChampion commented 7 years ago

I tried the latest z3 and I only see one problem left, on this benchmark. It seems the predicate inferred by pdr and spacer for |fail$unknown:17| (below) does not work, as clause 46

(assert
  (not (exists ( (|$alpha-2:$$tmp::2| Int) )
    ( and (|fail$unknown:17| |$alpha-2:$$tmp::2|) )
    )
  )
)

is falsifiable.

  (define-fun |fail$unknown:17| ((x!0 Int)) Bool
    (exists ((x!1 Int) (x!2 Int) (x!3 Int) (x!4 Int) (x!5 Int))
      (let ((a!1 (exists ((x!6 Int) (x!7 Int) (x!8 Int) (x!9 Int))
                   (let ((a!1 (= (not (= 0 x!9))
                                 (and (not (= 0 x!7)) (not (= 0 x!8))))))
                     (and (= (not (= 0 x!8)) (>= x!6 0))
                          (= (not (= 0 x!7)) (>= x!3 0))
                          (not (= 0 x!9))
                          a!1
                          (= x!2 0)
                          (= x!1 0))))))
        (and a!1
             (not (= 0 x!1))
             (= (= 0 x!4) (<= x!2 x!3))
             (or (= 0 x!4) (= 0 x!5))
             (= (not (= 0 x!5)) (>= x!3 0))
             (= x!0 1)))))
NikolajBjorner commented 7 years ago

The predicate is equivalent to false (it says x0 = 0 and x0 = 1), but by the time Z3 replaces this predicate it hasn't figured this out.

The correct link seems to be: https://raw.githubusercontent.com/hopv/benchmarks/master/clauses/lia/termination/x_plus_2_pow_n03.smt2

AdrienChampion commented 7 years ago

I don't think it's equivalent to false actually, x!0 appears only once and as far as I can tell it does not respect the clause I mentioned: https://rise4fun.com/Z3/YZ9T.

(Sorry about the link by the way, I meant this one as you pointed out.)

NikolajBjorner commented 7 years ago

x!1 occurs both = 0 and != 0 in the same conjunction, so the formula should be false. You are demonstrating an independent bug in qe_lite that has gone unnoticed. A fix is checked in. Thanks.

NikolajBjorner commented 6 years ago

with the latest updates can we close this?

AdrienChampion commented 6 years ago

I'm not sure, I just ran 56cc0a9018424409adb55b8aba75f92f2ac66d2e and the models seem wrong on most of my benchmark. On this one for instance:

> z3 fixedpoint.engine=spacer clauses/lia/fpice/inductive4.smt2
BRUNCH_STAT max_query_lvl 2
BRUNCH_STAT num_queries 3
BRUNCH_STAT num_reach_queries 0
BRUNCH_STAT num_reach_reuse 0
BRUNCH_STAT inductive_lvl 1
BRUNCH_STAT max_depth 2
BRUNCH_STAT cex_depth 0
sat
(model
  (define-fun |f$unknown:4| ((x!0 Int) (x!1 Int)) Bool
    (not (<= x!0 (- 4))))
  (define-fun |f$unknown:2| ((x!0 Int) (x!1 Int)) Bool
    false)
  (define-fun |incr$unknown:6| ((x!0 Int) (x!1 Int)) Bool
    false)
)

which according to my checks does not respect clauses 8 and 17.

In case it helps, here is a model

(model
  (define-fun |f$unknown:4|
    ( (v_0 Int) (v_1 Int) ) Bool
    (not (< (- v_0 1) (- 3)))
  )
  (define-fun |incr$unknown:6|
    ( (v_0 Int) (v_1 Int) ) Bool
    (= v_0 (+ v_1 1))
  )
  (define-fun |f$unknown:2|
    ( (v_0 Int) (v_1 Int) ) Bool
    (or (|f$unknown:4| v_0 v_1) (not (<= v_1 1)) (not (not (< v_1 (- 3)))))
  )
)
AdrienChampion commented 6 years ago

I tested the most recent version of z3 on our benchmarks and the ones from sv-comp mentioned above and, as far as I can tell, all models are now correct.

Thanks for the fix!

AdrienChampion commented 6 years ago

Actually, my validation workflow had a problem. The number of spurious models has decreased significantly, but there's still some left. On our benchmarks, I found 4 incorrect models:

I don't have smaller examples to provide yet, I'll try to find some and update this issue. What I did notice is that on all these systems, one of the predicates is not defined.

I also get spurious models on some of the LIA sv-comp benchmarks. The following in particular are a bit worrying, since they're benchmarks labelled "unsafe", which means that they should be unsat.