Bitvector performance regression

LeventErkok commented 2 years ago

The benchmark below was solved in about 20 seconds, using z3 compiled on May 29 2021 from GitHub master, producing the correct result:

sat
(
  (define-fun s1 () (_ BitVec 64)
    #x0002040810204081)
  (define-fun s0 () (_ BitVec 64)
    #x8080808080808080)
  (define-fun s4 () (_ BitVec 1)
    #b0)
)

Unfortunately, running it with z3 compiled on Oct 25 2021 does not terminate in 10 minutes. Looks like a big performance hit for this particular benchmark.

(set-option :produce-models true)
(set-logic BV)
(define-fun s4 () (_ BitVec 1) #b0)
(declare-fun s0 () (_ BitVec 64))
(declare-fun s1 () (_ BitVec 64))
(assert (forall ((s2 (_ BitVec 64)))
            (let ((s3 ((_ extract 7 7) s2)))
            (let ((s5 (distinct s3 s4)))
            (let ((s6 (bvand s0 s2)))
            (let ((s7 (bvmul s1 s6)))
            (let ((s8 ((_ extract 56 56) s7)))
            (let ((s9 (distinct s4 s8)))
            (let ((s10 (= s5 s9)))
            (let ((s11 ((_ extract 15 15) s2)))
            (let ((s12 (distinct s4 s11)))
            (let ((s13 ((_ extract 57 57) s7)))
            (let ((s14 (distinct s4 s13)))
            (let ((s15 (= s12 s14)))
            (let ((s16 ((_ extract 23 23) s2)))
            (let ((s17 (distinct s4 s16)))
            (let ((s18 ((_ extract 58 58) s7)))
            (let ((s19 (distinct s4 s18)))
            (let ((s20 (= s17 s19)))
            (let ((s21 ((_ extract 31 31) s2)))
            (let ((s22 (distinct s4 s21)))
            (let ((s23 ((_ extract 59 59) s7)))
            (let ((s24 (distinct s4 s23)))
            (let ((s25 (= s22 s24)))
            (let ((s26 ((_ extract 39 39) s2)))
            (let ((s27 (distinct s4 s26)))
            (let ((s28 ((_ extract 60 60) s7)))
            (let ((s29 (distinct s4 s28)))
            (let ((s30 (= s27 s29)))
            (let ((s31 ((_ extract 47 47) s2)))
            (let ((s32 (distinct s4 s31)))
            (let ((s33 ((_ extract 61 61) s7)))
            (let ((s34 (distinct s4 s33)))
            (let ((s35 (= s32 s34)))
            (let ((s36 ((_ extract 55 55) s2)))
            (let ((s37 (distinct s4 s36)))
            (let ((s38 ((_ extract 62 62) s7)))
            (let ((s39 (distinct s4 s38)))
            (let ((s40 (= s37 s39)))
            (let ((s41 ((_ extract 63 63) s2)))
            (let ((s42 (distinct s4 s41)))
            (let ((s43 ((_ extract 63 63) s7)))
            (let ((s44 (distinct s4 s43)))
            (let ((s45 (= s42 s44)))
            (let ((s46 (and s40 s45)))
            (let ((s47 (and s35 s46)))
            (let ((s48 (and s30 s47)))
            (let ((s49 (and s25 s48)))
            (let ((s50 (and s20 s49)))
            (let ((s51 (and s15 s50)))
            (let ((s52 (and s10 s51)))
            s52)))))))))))))))))))))))))))))))))))))))))))))))))))
(check-sat)
(get-model)

NikolajBjorner commented 2 years ago

Note: at this point tactic.default_tactic=sat sat.euf=true smt.ematching=false works best.

LeventErkok commented 2 years ago

I can confirm that these settings make it go really fast; in fact, significantly faster than the May 29 2021 version of z3.

Feel free to close the ticket as I can use this workaround for this particular case. Thanks.

NikolajBjorner commented 2 years ago

I don't have setup with me for bisecting where the regression entered. It could be a preprocessing issue, which degrades solvability. In particular, MBQI is required to solve this problem and it is extremely sensitive to syntax for the bit-vector case.

LeventErkok commented 1 year ago

@NikolajBjorner

Unfortunately the tactic you suggested:

 tactic.default_tactic=sat sat.euf=true smt.ematching=false

no longer works on this example, nor a direct call to z3 without any options can handle it. Looks like other performance regressions crept in. FYI.

NikolajBjorner commented 1 year ago

It appears related to a rewrite for s6 * (s0 & s2). Removing that newer rewrite, setting ematching=false, and using the default solver (don't set the other options) it finds a solution within a second with only a few instantiations. Why one method finds the instances and the other doesn't remains to be investigated. It might be sensitive to the values that the bit-vector solver finds for s0 s6 when peforming MBQI.

LeventErkok commented 1 year ago

Yes! I can confirm a new-build with the suggested setting works fine on this example.

NikolajBjorner commented 1 year ago

I currently don't think this example is suitable for regression testing: it is really brittle what works and what doesn't work. I used the example to test the new core. It definitely helped finding bugs, but also the new core didn't work out of the box (quickly, at least). With some tweaks I get it to work (after disabling elimination of unconstrained variables as a temporary code change, and) by setting sat.phase=always_false. The legacy SMT solver uses a default phase of literals to false, which is optimal for this example. The difference is stark depending on the phase default: it either immediately finds a solution or cycles through a lot of instantiations. There are reasons why the SMT core uses default phase = false, which is related to minimizing congruences, etc. So, it could be good takeaway from this example to use this heuristic.

The example appears to be a synthesis benchmark. You are synthesizing values for s0, s1 that satisfy some global bit-vector property. The strategy that works best is by minimizing s0, s1, and then checking whether the current minimal values for these bit-vectors work for the predicate. Z3 has no guidance along this direction built-in and I am not sure how such guidance would be learned automatically. Maybe something to inspire phase selection heuristics for quantified formulas, but for a regression test I suspect this will toggle between immediate and looong based on butterflies.

NikolajBjorner commented 1 year ago

for now, reopen this bug for the next few regressions as they will likely point to also real bugs as was the case this time.

NikolajBjorner commented 1 year ago

Note

(set-option :sat.smt true)
(set-option :smt.ematching false)
(set-option :sat.phase always_false)
(set-option :smt.elim_unconstrained false)
(push)
(define-fun s4 () (_ BitVec 1) #b0)
(declare-fun s0 () (_ BitVec 64))
(declare-fun s1 () (_ BitVec 64))
(assert (forall ((s2 (_ BitVec 64)))
            (let ((s3 ((_ extract 7 7) s2)))
            (let ((s5 (distinct s3 s4)))

goes through. disable phase or elim-unconstrained and it goes for a hike.

LeventErkok commented 1 year ago

Thanks Nikolaj..

For reference, this is indeed a synthesis problem. The idea is to take a 64-bit vector in the following shape:

1.......2.......3.......4.......5.......6.......7.......8.......

and turn it into another 64-bit value, that looks like this:

12345678........................................................

i.e., shift the numbered positions to the top part. (The dotted positions are ignored, and can result in anything.). The problem is to do this with one mask and one multiplication; s0 is the mask and s1 is the multiplier.

At first look, this looks like a frivilous thing to do, but it has real applications in game playing (chess engines use this to shuffle positions quickly apparently), and fast implementations of scatter-gather operations in HPC, where you have a mask and you want to collect certain bits in the highest byte, so it's a mask transformer.

NikolajBjorner commented 1 year ago

A good indicator for why you want to search for smallest bit-weights among s0, s1 is because they appear in positions where they contribute with scrambling the output of s1*(s2 & s0). If you have a family of such problems finding the preferred polarity of the bits of the inputs first and then rolling a home-made MBQI outside of z3 that uses local optimization of max-sat should be superior.

NikolajBjorner commented 1 year ago

I used this as a sample for digging into details: https://z3prover.github.io/slides/z3internals.pptx and https://microsoft.github.io/z3guide/programming/Example%20Programs/MBQI

Z3Prover / z3

Bitvector performance regression #5660