dafny-lang / dafny

Dafny is a verification-aware programming language
https://dafny.org
Other
2.87k stars 255 forks source link

Guidance on z3 versions? #5470

Open hmijail opened 3 months ago

hmijail commented 3 months ago

What change in documentation do you suggest?

Would be nice to have some concrete guidance of what version of z3 to use. The only thing I could find was the Dafny 4.0 release blog post, which says:

In fact, Dafny is now fully compatible with all Z3 versions from 4.8.5 through 4.12.1 (and possibly other versions).

4.12.1 was the latest when that post was published, and indeed current latest z3 (4.13) seems to work OK. So:

atomb commented 3 months ago

What change in documentation do you suggest?

Would be nice to have some concrete guidance of what version of z3 to use. The only thing I could find was the Dafny 4.0 release blog post, which says:

In fact, Dafny is now fully compatible with all Z3 versions from 4.8.5 through 4.12.1 (and possibly other versions).

4.12.1 was the latest when that post was published, and indeed current latest z3 (4.13) seems to work OK. So:

* Does this mean that the Dafny team recommends 4.12.1 until new notice, or simply that newer z3 versions aren't actively checked?

Newer versions aren't actively checked, so to be conservative we recommend 4.12.1. However, if you're okay with potentially needing to change a few things to get your program to verify, there's generally no harm in using a newer version. In our experience so far, the main thing likely to change is that especially brittle proofs may need changes.

* I'd hope that using newer z3 versions should bring in bug fixes and optimizations. Is there any disadvantage? Or more generally: is there any official measure to decide whether a version is better than another, or is it up to whatever results we get?

When evaluating an upgrade, we run it on a large collection of code and check how the resource use changes, how the variation in resource use changes when using different random seeds, and how many proofs start to fail. Ultimately, I've yet to see a single change in Dafny or Z3 that provides universally better results. Usually some things get faster or more likely to succeed while other things get slower or more likely to fail. There is often a general trend, though, and I've started to evaluate Z3 4.12.6 (which was the latest at the time). It generally seems like an improvement, though I've also certainly seen verification that performs less well with it.

* If latest z3 fails, is it worth it reporting that?

Yes, I think it is. The more information we have, the better. We may not wind up considering it a bug, per se, because some degree of failure is inevitable. It may be something that can be fixed, however, and it may help inform when and how Dafny starts to ship with a new version.