Initial support for a 'Boolector-native' timeout function via 'term' call-backs

aytey commented 5 years ago

This is an attempt to support a 'Boolector-native' timeout, utilising Boolector's 'term' call-backs for the individual solvers, without the user needing to write their own terminate functions.

Please note that this is a PR into the smtcomp19 branch.

How to use it (where <VALUE> is your timeout in millis):

pass -to <VALUE> to the main Boolector executable
add (set-option :timeout <VALUE>) to your .smt2 file
call btor.Set_opt(boolector.BTOR_OPT_TIMEOUT, <VALUE>) via the Python API

Some points to discuss:

[x] This only works for Lingeling and CaDiCaL, but will "silently do nothing" for other SAT solvers (e.g., MiniSAT, CMS or PicoSAT). What do we want to here? Error out the user?
[x] Currently, the timeout "takes precedence" over the term function (i.e., if you call Set_term after setting a timeout then timeout is still going to the one that gets checked). See: btorsat.c. Is this appropriate? If not, what's our preference? This is the most important point to resolve in my opinion.
[x] CMS also supports a millisecond-based timeout approach via its API (so not a call-back), I think it would be straightforward to call CMS's API with BTOR_OPT_TIMEOUT before invoking the SAT solver. See: set_max_time. However, this might not be in a "released" version of CMS yet (while the API existed before CMS 5.8.6, it didn't actually work and was fixed in May). Do we also want to support this? If so, do we want to change setup-cms.sh to use master rather than a release tar.gz on GitHub.com?
[x] This code currently uses clock() on all platforms (which I think is fine). However, there are other parts of Boolector that use getrusage (on platforms that support it), which might be more accurate. See: btorutil.c. Do we want to choose between clock and getrusage depending on the CMake checks?
[x] This change was made by clang-format-diff.py and not manually by me. I can manually undo this if we care?
[ ] Do we want tests for this? Do we want API usage examples? Do we want documentation?

Signed-off-by: Andrew V. Jones andrew.jones@vector.com

aytey commented 5 years ago

I've seen that this branch is failing on Travis. I checked it locally with the smtcomp19 branch and this also fails in the same place:

Running read4_special test: /home/avj/boolector/boolector_smtcomp19/test/testspecial.c:61: run_test: Assertion `boolector_sat (g_btor) == expected' failed.

Does ./bin/test work for you (locally) on the smtcomp19 branch? Given there's not a PR for that branch, I cannot see what Travis should be doing to compare.

mpreiner commented 5 years ago

About the smtcomp19 branch: It is currently broken due to CaDiCal's recent API change. I already fixed that on master (5230a629fcc906bfb31084457764b8cbd8533360). I'm currently working on cleaning up the smtcomp19 branch to merge it back to master. In the meantime it's not recommended to use smtcomp19 with the newest CaDiCaL version. I'll have a proper look at this PR asap. Thanks for the work!

mpreiner commented 5 years ago

Here is what I think:

1) I think we should only warn the user that the timeout will be ignored/inaccurate since the chosen SAT solver does not support terminate callbacks. You could do this with the BTOR_WARN macro.

2) You mean that as long as the timeout option is set, Boolector will always use the timeout terminate handling instead of the custom terminate handler? Here are two solutions I can think of right now:

Warn the user that setting a terminate callback will be ignored since the timeout option is still set
Reset the timeout if a new terminate callback is configured and let the user know I have a slight preference for the second solution. I'm not sure if we want to start having a set of terminate function that need to be checked. I would keep it simple. If a user needs multiple terminate conditions there is still the possibility to implement a custom terminate handler (even if you need a C/C++ handler and use the Python API).

3) Let's hold back on that until Mate makes a new release that includes this feature. On the cleaned up smtcomp19 branch I have a commit that freezes the versions of all dependencies to make the builds more reproducible (to avoid broken Travis builds like we had with CaDiCaL's API change).

4) It would be great to have a portable solution (also for btor_util_time_stamp) that is as accurate as possible. If it's not too much overhead for you it would be great to use getrusage whenever possible and fall back to clock for systems where it is not available.

5) Weird. Yes, please change it back manually.

6) Yes, yes, and yes :-)

What do you think?

aytey commented 5 years ago

Rather than fragment the comment over multiple smaller comments, I think here's the high-level gist (I just want to make sure we're on the same page before I make changes):

We want this timeout to not only be for the SAT solver, but also for the processing that Boolector does when Sat is called on a Boolector instance (100% agreed)
Boolector will check the term function if it is set as part of this processing, so if we set the timeout term function "earlier on", then this will also be triggered
The setterm call that exists inside of btorsat.c is (effectively) just "passing" the current term function from the Boolector instance into the current SAT solver, so we do not need a "specific" term function of the SAT solver (i.e., if the timeout term function has been set earlier on, then this will just get passed to the SAT solver)

Seem like the right idea to you?

aytey commented 5 years ago

On the other points:

BTOR_WARN on having a timeout with an unsupported solver -- easy!
I agree on your choice about warning and then disabling the previous choice -- if the user has done btor.Set_opt(boolector.BTOR_OPT_TIMEOUT, ...) followed by btor.Set_term(...), we will BTOR_WARN that the timeout has been disabled. Equally, if the user has done btor.Set_term(...) followed by btor.Set_opt(boolector.BTOR_OPT_TIMEOUT, ...), we will BTOR_WARN that the terminate function has been disabled
Okay, nothing for CMS (collides with 1.)
I don't think adding the stuff for getrusage will be that hard (Boolector has the CMake checks already), so that's fine
Will do!
Let's discuss the tests when we're happy with the rest of the branch. I wrote some timeout tests for STP, which were then disabled because they took longer than a quick smoke test (not unreasonable, but it means the logic is untested). I'd like to put these tests somewhere that they validate the behaviour but not impact a developer wishing to quickly test their build of Boolector

mpreiner commented 5 years ago

Rather than fragment the comment over multiple smaller comments, I think here's the high-level gist (I just want to make sure we're on the same page before I make changes):

We want this timeout to not only be for the SAT solver, but also for the processing that Boolector does when Sat is called on a Boolector instance (100% agreed)

Boolector will check the term function if it is set as part of this processing, so if we set the timeout term function "earlier on", then this will also be triggered

The setterm call that exists inside of btorsat.c is (effectively) just "passing" the current term function from the Boolector instance into the current SAT solver, so we do not need a "specific" term function of the SAT solver (i.e., if the timeout term function has been set earlier on, then this will just get passed to the SAT solver)

Seem like the right idea to you?

Yes

mpreiner commented 4 years ago

@andrewvaughanj Can you maybe rebase on master if you have time? The commits from the smtcomp19 branch are now merged to master.

aytey commented 4 years ago

Temporarily closing this as I am (actually!) about to start looking at it, so don't want the pipelines to continually run on in-progress changes.

aytey commented 4 years ago

Re-opening this for initial consideration.

Changes:

All of the timeout logic has been moved out of btorsat and into btorcore, such that the timeouts take effect even without hitting the SAT solver
There's now checking of the options to allow us to print a warning to the user if they try to use timeouts with a SAT solver that doesn't support it
There's now checking on the ordering of trying to use a timeout vs. termination function (it is now setup such that "most recent wins") -- this (probably) only impacts API users, as you can't set a timeout function via SMT-LIB/the command line
The timing code in btorutil now uses the CMake checks for getrusage and uses getrusage when available, and falls back onto clock otherwise
I've added 10 tests (ctest -R timeout*) -- 4 for Lingeling/CaDiCaL (checking that the timeout works and gives unknown); 6 tests for CMS/MiniSAT/PicoSAT (checking that we get the "WARN" about timeout not being supported)

What's not done:

[ ] Examples of API usage
[ ] Tests for API usage
[ ] Tests that actually check that the timeout "does the right thing" (right now, the tests throw a hard instance at the solvers and terminate after 0.5 seconds, and we check for unknown)
[ ] Documentation

I'm opening-up this PR now, as these last few are "nice to have", and I don't want to invest time in those if you're unhappy with the rest!

As discussed, this will get squashed once we're all happy.

aytey commented 4 years ago

One thing I'm not certain of is, for the "unsupported" solvers, the ordering of the output is a bit odd:

sat
[btoropt] btor_opt_check_solver_supports_timeout: WARNING: SAT solver PicoSAT does not support termination functions; timeout option may not have effect

Notice: sat comes before the option validation code -- really not sure why that it is.

aytey commented 4 years ago

So I was interested in #84 and seeing how timeouts may/may not work in that context.

Consider something like:

(set-logic QF_BV)
(set-option :timeout 500)
(declare-fun v2 () (_ BitVec 16))
(declare-fun v1 () (_ BitVec 16))
(push 1)
(assert (not (= ((_ sign_extend 16) v2) (_ bv0 32))))
(push 1)
(assert (not (= (bvsdiv (bvmul ((_ sign_extend 16) v1) ((_ sign_extend 16) v2)) ((_ sign_extend 16) v2)) ((_ sign_extend 16) v1))))
(check-sat)
(pop 1)
(check-sat)

If I use Z3 on this, then I get:

unknown
sat

however, on Boolector with my timeout changes I get:

unknown

because things "get terminated" and Boolector doesn't carry on.

I then changed terminate_aux_btor such that it didn't cache the result of the timeout, and then changed some failed assertions.

However, I then hit a bigger issue in the SMT2 parser. We have this flow parse_smt2_parser -> boolector_terminate -> btor_terminate. This flow does not re-call btor_set_timeout, which means that the old timeout value remains, so once we've exceeded our time-limit once, we're continually just going to say terminate. As such, we don't even read the SMT2 commands that come after our first unknown!

I haven't tried this (yet), but I believe that API users would be immune to this, because they're not going to keep checking the timeout inside of their application, which means the Boolector flow would continue:

b.Push()
b.Sat() # gives unknown
b.Pop()
b.Sat() # <-- this will now reset the timeout flag, so we're "all good"

However, #84 might hit me here, because Boolector might not run the second Sat if it has a cached timeout result.

My bigger issue here is that this might cause some "end-user confusion" if people try to use set-option :timeout and expect Boolector to match Z3's behaviour. Without modifying the flow of parse_smt2_parser (which I'm hesitant to do!), then I'm not sure how we can achieve matching Z3's behaviour. I guess if we believe that the timeout (in SMT2) is per command, then we could update the solver "deadline" after reading every SMT2 command, this would allow the parser to continue.

Thoughts?

arminbiere commented 4 years ago

A clean solution would have global and local limits. In CaDiCaL I have 'solve' local conflict (and decision) limits for the API. The stand-alone solver has command line option conflict and decision limits, which is fine as long the CNF triggers one check, but I started to work on an INCCNF version (for cube-and-conquer), where now there could be global and local limits. The same is true for time limits. So you might want to have local and global versions. Then one could adopt the policy that local limits terminate only certain calls, particularly 'solve' calls, but not for instance 'parsing' or 'assert' calls. After a local limit is hit (in 'solve' in general), the solver remains in a sane state, the aborted command is in essence not changing the state of the solver (except for learned clauses, updates to counters and heuristics), except that for instance assumptions or those local limits are forgotten. For global limits we could keep the current Boolector semantics, i.e., after a global limit is hit the solver in essence becomes unusable (except maybe getting out some deduced facts). Does this make sense? ['solve' = 'check-sat']

aytey commented 4 years ago

@arminbiere thanks for the comments -- I think I need to leave some of these decisions up to @aniemetz/@mpreiner.

However, I think the issue I highlighted only affects the SMT2 front-end, because that also checks the timeout, in a way that maybe doesn't make sense if you're using timeouts for termination.

I think to work out the best way forward is to decide what (set-option :timeout 500) means in SMT2 parlance. If it means, "timeout for the whole file", then maybe printing unknown and terminating is correct. If it means "timeout for each check-sat", then we need to modify the SMT2 parser such that it doesn't "stop" after a timeout and also re-calculate the "time deadline".

cdisselkoen commented 4 years ago

Curious if there are any updates on this? I'm in favor of a solution that results in matching Z3's behavior here.

(I'm the guy behind issue #84, and I'm also the primary author of the Haybale symbolic execution engine, which I've been very happy to build using Boolector - thanks for everyone's work on Boolector! To that end I'm also the maintainer of these safe Rust bindings for Boolector, where I recently implemented a timeout feature before discovering the issue in #84. I'm happy to contribute to discussions if anyone is interested in my use cases, and also willing to help with implementation for this timeout stuff if that would be helpful, although I have very little familiarity with the internals of Boolector.)

mpreiner commented 4 years ago

I think :timeout should be per (check-sat) call. If the user wants a global timeout we have either the cli option for that (-t, --time) or we can introduce a global timeout option for the parser that starts the counter as soon as we parse the option.

aytey commented 4 years ago

@mpreiner okay, so, as a first step, I'll take a look at getting this working as part of parsing a SMT2 file in incremental mode and making sure multiple sat-checks are possible (and timeout).

If you have any comments about other parts of the branch so far, let me know.

aytey commented 4 years ago

Not required.

Boolector / boolector

Initial support for a 'Boolector-native' timeout function via 'term' call-backs #62