Forking/Serializing a z3 Solver

ltfish commented 9 years ago

We are a bunch of people working on a symbolic execution engine, and we are intensively depending on Z3 (needless to say, it's really good :-) ). Now we are looking into possibilities of forking Z3 solvers. Basically we'd like to use the incremental solving support in Z3, instead of solving all constraints again once an execution path branches. We looked into code, and found out much effort is required to make Z3 do what we need. What do you think the best option is?

I'm wondering if you will support forking or serializing a Z3 solver soon, or any similar solution is in your future plan. If not, and in case we go ahead and implement the serializing support, what are the chances of merging it in to master? Maintaining a big bunch of code by ourselves is suboptimal.

cao commented 9 years ago

+1

To add on @ltfish's question, in case of having to implement it: are there any possible licensing issues that need to be considered/looked into in case (big if) one wants to depend on existing serialization libraries/tools? Would any library under a permissive license work (MIT/BSD-style)? Is it correct to assume that more restrictive licenses (GPL) are a big no-no?

NikolajBjorner commented 9 years ago

Let me try to understand in concrete terms. You are either asking about Z3 interoperating with the allocated pages of an OS process or you are asking about Z3 supporting OS independent cloning mechanisms. The second post indicates an understanding around first reading.

Based on the second reading it sounds like you would like the following basic API: Z3_solver Z3_API Z3_solver_translate(Z3_context source, Z3_solver s, Z3_context target);

This would let you spawn a new context, and translate a solver object into the new context. The old and new contexts can be used in different threads. Z3's API already has Z3_ast_vector_translate, which lets you convert an ast-vector from one context to another. Z3_solver_get_assertions returns the current assertions in a solver as an ast_vector, so you can use these functions to directly implement Z3_solver_translate.

You can almost always serialize solvers to disk by pretty printing them to SMT-LIB2. Are you perhaps finding this inefficient?

My overall question is whether you need more than this a solver clone mechanism?

wintersteiger commented 9 years ago

Forking solvers is relatively easy, as Nikolaj mentioned we already have AST translations which take care of most of that. Serialization is also easy to do, in fact we used to have a serializer to binary streams some time ago and we could reactivate that, so no external libraries are necessary (but yes, those would be a license issue).

Regarding getting something like that into master:

Maintaining a big bunch of code by ourselves is suboptimal.

If it's too much for you to maintain it, then there is no chance we'll accept it. Any new feature that is added needs to have someone to maintain it; it doesn't necessarily have to be you, but someone needs to be responsible. Of course, maintaining a big bunch of code is also "suboptimal" for us :-)

ltfish commented 9 years ago

Thanks for those pointers. Let me read the related code first :-)

However, based on my understanding of your proposed solution, the internal state of a solver is not kept, which means all assertions have to be evaluated again. Please correct me if I'm wrong.

By saying "forking the solver", we'd like to achieve the following two goals:

Migrate the internal state of the original solver to the new one. So no re-evaluation is required.
Reuse the memory pages (or Regions, which is the internal memory allocator for scopes) for both the original and the new solver.

What's your idea on that?

If it's too much for you to maintain it, then there is no chance we'll accept it. Any new feature that is added needs to have someone to maintain it; it doesn't necessarily have to be you, but someone needs to be responsible. Of course, maintaining a big bunch of code is also "suboptimal" for us :-)

Thank you. If I end up implementing the code, I'll maintain it as a patch or a separate project if possible.

NikolajBjorner commented 9 years ago

So, to be clear: you have measured that, in your usage, any potential gains of retaining learned clauses is sufficiently important. Otherwise, the typical observation that I experience is that internalizing formulas and relearning lemmas is a relative small overhead compared to search. The two main portions where state can be captured are the sat_solver and smt_solver cores. They wrap into different solver objects, where the sat_solver is now getting available for incremental use as well. The main incremental object so far has been the smt_solver context. Cloning it seems rather error prone to program because you would want to maintain the associating between expressions and internal representations (e-nodes and clauses), and ultimately, I am not sure there is a real use case for it.

ltfish commented 9 years ago

@NikolajBjorner I didn't make it clear in my previous posts, and let me try to rephrase that.

I believe the gain in caching internalized formulae and lemmas is completely not comparable with the overhead of searching the result space. What I care about is retaining the searching result of the old solver in the newly-forked solver, so we don't have to search again (for all previously added clauses) when a new assertion is added to the new solver - instead, we only search a smaller space. I believe this is what incremental solving in Z3 is about. Let me know if I made a mistake here.

Cloning it seems rather error prone to program because you would want to maintain the associating between expressions and internal representations ...

After reading more code, I believe you are absolutely right. Almost every class under smt/ will be changed if I want to add support of solver cloning (while keeping its entire internal state).

NikolajBjorner commented 9 years ago

Let me know if I made a mistake here

It really depends on usage scenario, so the best thing is to make measurements using the simple cloning technique as suggested above and then determine if the extra feature is worth it. I assume that you are using the state in incremental mode, so the overhead of several incremental calls would dominate setup time for a single incremental call.

If it is worth to make things more advanced, then I would consider to go for something less than cloning everything: I would expose interfaces for extracting and adding internalized and learned clauses to the smt_context and have the new context re-internalize (but not simplify) the state so that we would not have to clone internal data-structures. It would not be an exact clone, but it would contain the state corresponding to how search has progressed. We would also have to add a "translate" facility to the solver class, but can use a default simple implementation.

ltfish commented 9 years ago

It really depends on usage scenario, so the best thing is to make measurements using the simple cloning technique as suggested above and then determine if the extra feature is worth it. I assume that you are using the state in incremental mode, so the overhead of several incremental calls would dominate setup time for a single incremental call.

Make sense. Let me build a concrete example later and post it here.

If it is worth to make things more advanced, then I would consider to go for something less than cloning everything ...

I like this idea. It should be enough. Thanks!

ltfish commented 9 years ago

I put together some code here to simulate our common use case. Here is a sample output:

(angr)/home/angr/angr% python test.py
Non-incremental mode
solving base constraints takes 6.297842 seconds
solving additional constraints (as well as old constraints) takes 8.925401 seconds

(angr)/home/angr/angr% python test.py inc
Incremental mode
solving base constraints takes 6.578623 seconds
solving additional constraints takes 3.558465 seconds

Those numbers deviate a little between different runs with a delta of 0.5 seconds. I believe it explains why a way to "fork" solves is beneficial for us.

NikolajBjorner commented 9 years ago

OK, so the approach to get there I see so far is to add a "translate" method to the solver objects, expose it over the API, and for the smt_solver, which relies on smt_kernel and then smt_context specialize this translation method to take learned clauses. Still a bit tricky to get right.

NikolajBjorner commented 9 years ago

I have added a fork facility. Copying all learned lemmas turned out to be an overhead in my testing so far so at this point it copies unit literals only.

ltfish commented 9 years ago

Thanks! We'll see how to integrate it into our system!

Z3Prover / z3

Forking/Serializing a z3 Solver #209