Feature: Persisting the solver state and Resuming from it.

I would like to inquire your opinion on a feature idea that would IMHO make the use of Z3 significantly more flexible as well as open up new application areas.

It would be great if Z3 was at certain points in time (during (check-sat) and while waiting for further input) able to persist its internal state to disk and quickly resume its operation from such a persisted state. On the user-interface-level this could be realised by adding a key combination to make it dump its state and terminate as well as a SMTLIB-style command like (persist <filename>) and a command-line argument like --resume <state-file>.

This would have the following advantages:

Incrementalization

Using Z3 incrementally (telling it about a large problem/formula once and then only telling it about the changes to that formula (as the user makes them) in order to update it) currently requires leaving a Z3 process running in the background. This becomes a problem when the user closes the application or restarts the computer as the Z3 process is terminated in both cases and the next time the user starts up the application, a new Z3 process will have to be started and the entire problem will have to be communicated and solved from scratch, which can take quite a while if the problem is large. This could be avoided by storing a dumped Z3 state alongside the formula and updating it whenever the user changes the formula by resuming from it, updating Z3's state using pop and push and then dumping the new state. This way the effort spend by Z3 for deriving learned clauses, etc. could be re-used even across application and computer restarts.

Parallelisation

When having a large formula (think: system model) and a large number of queries (think: properties to check) about it, it would be nice to parallelise the solving. However, currently this would mean that n computers would each start a Z3 process, read in the entire formula in and solve one query about it. As there is usually a base-query (for instance: consistency of the system model), which already requires the solver to analyse the entire model and would thus derive a lot of learned clauses, it would be desirable to do this instead:

Read the entire formula in and run the base-query on a single computer, which derives the learned clauses
dump the Z3 state of that process afterwards
copy it to all n computers and let each computer resume from that state and process other queries in a push/pop manner until everything is solved. This would maximise the reuse of work while not requiring the n computers to communicate.

Moving long-running queries

When using Z3 for solving hard problems (for instance: deriving winning strategies for games), one often encounters long-running queries. Usually, I use my laptop to write the queries and test them until everything is in order and Z3 does not find any trivial solutions anymore and starts to work on the hard part. Unfortunately, sometimes it does not terminate until I need to clock out and get home. It would be great if I could just tell the Z3 process to persist its state onto disk, copy this state to a server and let the server resume it from there. This way I could come to work the next day and find the solution to my problem on the server instead of either letting the server redo all the work my laptop already invested or waking up my laptop the next morning and waiting for Z3 to compute another couple of hours.

I know that Z3 did not support this a while ago (https://github.com/Z3Prover/z3/issues/1044), but I would like to know a) Do you consider this idea useful or is there a simpler way to achieve the stated benefits? b) Are you (or someone else you know of) already working in this direction? And if not, c) Are you planning to add support for this in the future?

I take there are two parts to your post:

Expose hooks to make saving state usable.
The format and fidelity of the state that is saved.

Regarding 1, I wonder if adding a configuration option to save state to a file if a solver is interrupted will be a way to go.

Regarding 2, much of what is below is also discussed in the earlier post, #1044. A few things changed since

So far z3 can persist state by serializing solver state to smtlib or dimacs. Text files in this format are much easier to work with for debugging. This isn't wrapped in a way that is as usable as you suggest (with commands). You are also bound to lose lemmas.

For the SAT solver I include lemmas of low glue level and have configuration flags to control this. The SAT solver serializes to either dimacs or SMT-LIB. Using SMT-LIB has the advantage that the format includes model transformations so that a solution to the saved formula can be translated to a solution to the original formula.

I use this serialization for distributed parallelization and I think the text file-based approach is preferrable for debuggability and portability.

Saving state that is produced from SMT core is much cruder (= limited to asserted formulas after pre-processing, excluding lemmas) and other kernels (nlsat) simply non-existent. It possibly could be improved if there is a clear use case, such as yours. This would be for text-based state snapshots. Since the text based formats dont' expose a notion of redundant clauses they would be limited to learned clauses with low glue level, which is not much of a restriction (because it amounts to running agressive GC).

A caveat with the text-based format is that it doesn't handle internal identifiers created during pre-processing correctly: you have to rename them (e.g., rename k!10 to k!_10) before using them again so to avoid that a new internal identifier k!10 gets created.

A binary format would be tougher to get right, have to address all questions as the textual formats, but potentially be more flexible and could address the internal identifier issue systematically. Some of this was engineered around 2010 for MPI, by @cwintersteiger, but subsequently bit-rot. Features that would have to be serializable:

model and proof transformers
smt-context state
sat solver state
maybe nlsat, qsat, other solver states
ast-manager state An OS-based approach, with native snapshots would be nicer to administrate. This would be outside of Z3.

Thanks for the in-depth answer.

Regarding 1, I wonder if adding a configuration option to
save state to a file if a solver is interrupted will be a way
to go.

Sure, one does not need an extra key combination, one can also repurpose SIGINT for that.

The text-based format is not a problem, as long as resuming from it is faster than re-solving from scratch based on the original SMTLIB input. I also do not intend to do anything with the persisted states apart from resuming Z3 from them.

Could you elaborate on what you mean by "bound to lose lemmas". Is this a mere performance issue (Z3 is able to re-derive these lemmas when resumed) or could this potentially result in unsoundness?

It seems @cwintersteiger already build quite a bit of infrastructure for this. Was MPI aiming at something similar? When you say that a) there is still a lot to do and b) the existing infrastructure is bit-rotting since 2010, I guess it is not reasonable to expect this to be supported in some not-so-distant future release?

The OS-based approach seems like a good alternative. However, just like Z3 our application runs cross-plattform, and we would hence need to find a solution on at least 3 different OSs.

I think I will first try to realize this on the level of VirtualBox or Docker. I will report back to you on whether this provides a sufficient solution.

I will add a "solver.cancel_backup_file=" parameter. To test it, it would be useful with some representative workloads. The SAT blasted workload uses one code-path, the SMT-workload will need code revisions in the SMT solver. Optimization, NLSat, Qsat and Fixedpoint workloads will not be included in this. In the current serializer, when solving using assumptions, the assumptions get added as assertions so they will be lost too. This can be changed, but depends on if there is a use case for this.

I will add a "solver.cancel_backup_file=" parameter. That sounds great! I will take a look at it.

to test it, it would be useful with some representative workloads. The SAT blasted workload uses one code-path, the SMT-workload will need code revisions in the SMT solver. Optimization, NLSat, Qsat and Fixedpoint workloads will not be included in this.

That is ok from my perspective. We are mainly using SAT, but can extend our models with Integers, Floats, etc. Therefore, we are able to test whatever combination currently works.

In the current serializer, when solving using assumptions, the assumptions get added as assertions so they will be lost too. This can be changed, but depends on if there is a use case for this.

"Solving with Assumptions" means passing a list of assumptions to the (check-sat)-command, right? I don't think it is important to preserve these. When using assumptions, one is usually passing a different set to every (check-sat)-command anyway...

I followed your discussion, as I'm also interested in resuming Z3 based on previous instances. I have two questions to understand better what are the consequences of resuming the solver:

I'm wondering too which extent the solver state is stored now (i.e., all learned clauses)? In other words, can we expect the same performance when resuming solvers (or is it likely to be slower as it misses some of the already computed internal state)?
Furthermore, I would like to understand whether user and learned constraints can be distinguished in the export? While the distinction of learned clauses is not necessary for check-sat, it would be interesting when computing MUS (i.e., minimal/minimum unsatisfiable subsets), which are supposed to not include learned clauses.

Z3Prover / z3

Feature: Persisting the solver state and Resuming from it. #2095