Z3Prover / z3

The Z3 Theorem Prover
Other
10.38k stars 1.48k forks source link

resource limit in addition to time limit #56

Closed kanigsson closed 9 years ago

kanigsson commented 9 years ago

Hello,

This is a request for a feature that would be very important for us to use Z3 with the SPARK tools (spark-2014.org).

The idea is a command-line option which allows to specify a "resource limit" instead of (or in addition to) currently available memory and time limits. The "resource" is just some internal counter, which gets increased in Z3 on specific deterministic moments, like finding a conflict or deciding on a literal or some such. When the counter hits the specified limit, Z3 stops just as it does when it hits the time or memory limit.

Such a limit allows to get reproducible behavior of a prover even under heavy load or across machines, sometimes even across platforms. This is essential for our nightly testing, and for users of our tools, for their nightly testing.

The two other SMT solvers we currently use, Alt-Ergo and CVC4, both have such an option: -steps-bound for Alt-Ergo, and --rlimit for CVC4. We have worked with the two development teams to improve these options - mostly finding time-intensive modules where the internal counter was not updated, basically allowing the prover to exceed the limit.

Or maybe there already is a way in Z3 to achieve what we want?

Thanks for considering this request.

Johannes

wintersteiger commented 9 years ago

There are various max_* options in Z3, but I don't think any of them would be a solution to your problem. Could you elaborate on why timeouts are not good enough for you?

As long as the default Z3 tactics are used, there will always be an element of non-determinism, because some of them use timers on pre-processors, so sometimes it will end up with a whole different simplification of the problem, such that any other counter on conflicts or similar things will still be non-deterministic. (see e.g., all the try_for's in the QF_LIA default tactic: https://github.com/Z3Prover/z3/blob/master/src/tactic/smtlogics/qflia_tactic.cpp). In some cases the SMT kernel is also not used at all, because some other part of the tactical framework will solve the problem without ever counting any conflicts, but still requiring non-deterministic, long runtime. I don't think there is a simple solution to this problem that's better than using time/memouts.

kanigsson commented 9 years ago

Thanks for your answer, and sorry for the late reply.

Let me first try to motivate why timelimit/memlimit is not good enough for us.

We have a testsuite with over 10000 VCs, and we run that testsuite on 4 different platforms (windows, linux, darwin, 32 and 64bits). We have an oracle which says which VCs should be proved and which not. Any fluctuation in the results requires quite costly manual inspection ...

Sometimes, we change our build/test machine for something more powerful, and of course the developers run part of the testsuite on their own (less powerful) machine. Sometimes the test machine is under load because of some other job. So we need reproducible results of all involved tools despite this quite diverse environments, and timeouts are not a realistic way to achieve this.

If a steps limit is used, then yes having an internal timeout like you describe is a no-no. But this seems to be easily fixable, you can just replace the timeout by (again) a counter limit. For example, the preprocessor is allowed to do n simplification steps.

Yes, when you use a steps limit, then all parts of the solver that consume non-negligible time need to contribute to increasing the internal counter. So this includes the preprocessor you mentioned, and the other part of the tactical framework you mentioned.

The alt-ergo and CVC4 tools both had an option for this, but they didn't really work precisely because some part was not contributing to the counter. Finding all those parts was exactly the work we went through with Alt-ergo and CVC4 teams. It works well now.

To get determinism, some other precautions must be taken, for example z3 allows to set various random seeds, these would also be required to be set to obtain reproducibility.

Finally, sometimes people think that steps should correspond linearly to time (e.g. 1000 steps correspond to 1 second or so), but this is probably impossible to achieve and not something we need. All we need is an internal counter which proceeds in a deterministic manner (unlike a timer), and no part of the solver being allowed to hog significant CPU time without the counter being increased.

Hope this explains why we need this and how it could be achieved ...

wintersteiger commented 9 years ago

Thanks for the detailed explanation. The random seed in Z3 is 0 by default, so that part shouldn't be an issue. Introducing various counters into all relevant parts of Z3 is a non-trivial task as it would touch pretty much every file in the codebase. And it still wouldn't get you there because of other issues, e.g., some hash functions depend on the ordering of objects in memory, which could be different in every run. I still fail to understand the reason for having full determinism: If your VCs didn't change, you can just keep yesterday's result, Z3 won't change it's mind about the result (even if it does change it's mind about how to get there). If there is a new or modified VC, then we don't know the resource limit and we'll have to guess a limit. If that limit is guessed by the engineer, they might just as well write "UNSAT" into a timestamped file at the same time.

kanigsson commented 9 years ago

Hello,

yes, for the hash functions which depend on memory location, I don't see a solution. It seems that CVC4 and Alt-ergo don't do such things.

Considering the "caching" of VCs you describe - we have such a mechanism, but it can't always be used in nightly testing. There are two different use cases: we as a tool vendor, and the customer as a user of the tool.

As a tool vendor, we want to make sure that the tool works well without any kind of cache, and we also want to catch potential problems e.g. in the SMT solvers. So not running the SMT solver in the nightly run (because of using the cache instead) would partly defeat the purpose of nightly testing.

Most users of SPARK are in a context where some kind of certification is required. Some customer explained this problem to us, unfortunately I forgot half of the explanation. What I recall is that a VC cache is bad, because the cache is both an input and an output of the verification process, which is something the certification standards don't like. So they definitely can't use the cache for the "final", validating run of the tool, and they want to make sure regularly that they can do this run at any time, so in the end they will not use the cache in nightly testing, only as a convenience in day-to-day development.

A last thing, the main purpose of steps is not to reproduce proofs reliably, but reproduce any tool output reliably, whether that is UNSAT or UNKOWN or something else. When I run "cvc4 --rlimit=1000 some_vc", I will get always the same result, whether cvc4 manages to prove the VC or not.

I'm not sure I convinced you of the usefulness of the steps limit, what I can assure you that if Z3 has such an option, we will use it :-)

yannickmoy commented 9 years ago

To complete Johannes's explanation, the determinism is critical in our nightly regression testing to maintain a single oracle (VCs proved or not) across all platforms (maintaining a different set of oracles per platform would be too costly to maintain), and critical in the certification done by our customers (where a certification authority should be able to examine and sometimes reproduce your verification activities). To the point that neither us nor our customers submitted to certification can use a non-deterministic prover (that does not prevent using heuristics and non-determinism internally, but the result should be deterministic).

wintersteiger commented 9 years ago

Thanks again for the detailed explanation! Your point about certification authorities is very convincing, they definitely won't like anything non-deterministic. We're thinking about potential approaches to this problem, but so far we haven't found anything that solves all those problems. To get a better idea of what would need to be done, can you tell me what kind of theories you use and whether a deterministic tactic that is different from the default tactics (e.g., slower) would be an acceptable solution for you?

yannickmoy commented 9 years ago

Currently, we're using the theories of non-linear arithmetic over integers and reals, and the theory of bitvectors, and everything else is expressed through axiomatization with uninterpreted functions. In the future, we'll use also the theory of arrays and IEEE 754 floating-points. We may want to use the theory of records at some point.

A slower deterministic tactic would not be a problem for us. It would take a looong time for something that the prover knows to handle to become slower than manual inspection or manual proof.

NikolajBjorner commented 9 years ago

I added a new resource limit to the unstable branch. You can call it as follows:

 z3 10u05.04.smt2 memory_max_alloc_count=10000
 (error "line 43 column 10: number of configured allocations exceeded")

Z3 prints the allocation count if you ask it to return statistics, e.g.,

z3 10u05.04.smt2 -st
sat
(:cnf-encoding-aux-vars 103
 :eliminated-vars       3
 :max-memory            2.22
 :memory                1.18
 :num-allocs            9487736.00
 :time                  0.71
 :total-time            0.71)
kanigsson commented 9 years ago

Thanks a lot! I've run this on our testsuite and overall it seems to work well.

I have a few issues to report, but for this I need to attach relatilvely big VC files. Can you advise how I should do it, as one cannot attach files to github issues? One issue is related to the new option you added, but other issues may be unrelated (mostly crashes).

NikolajBjorner commented 9 years ago

You can upload files using "gist". There is a menu option/link on the top of the github.com pages for creating ad-hoc repositories that can be used for single files.

kanigsson commented 9 years ago

Thanks! Actually I didn't find any major issues with the option you implemented. This one is a minor one.

So please look at this file slow.smt2.

Using the unstable branch, compiled on my linux x86_64, when I do:

    z3 memory_max_alloc_count=8727000 slow.smt2

it takes about 8 seconds on my machine to exceed the limit. But increasing the limit slightly

    z3 memory_max_alloc_count=8728000 slow.smt2

it seems to loop, I stopped z3 after several minutes. It probably means that some internal code does not consume any resources.

kanigsson commented 9 years ago

This one is another issue probably related to the resource limit option, because it doesn't appear without. On this file crash.smt2, if you run z3 with a high enough limit:

z3 memory_max_alloc_count=8728000 crash.smt2

it crashes with a Segmentation fault. This is again on x86_64 linux. It doesn't crash with lower resources (instead one gets the resource limit error) and without the resource limit opiton.

Let me know if you need more information or you want me to create a new issue.

NikolajBjorner commented 9 years ago

Bugs exposed by slow and crash are now fixed in unstable