Z3Prover / z3

The Z3 Theorem Prover
Other
10.36k stars 1.48k forks source link

Please support __reduce__/serializing in the Python module! #4935

Closed ytrezq closed 3 years ago

ytrezq commented 3 years ago

Because of the gil and that objects can t be passed from one thread to the other one, Python programs using z3 have to run sometimes during weeks single threaded (when z3 is not a about the performance issue or in hyperthreaded environments where z3 parallelization is not profitable).

the Cpython 3.9 version introduced sub interpreters allowing multiple python instances in the same process which allows C modules to share some code, so that it might be a solution if complete serialization between process is too much overhead.

NikolajBjorner commented 3 years ago

If you use a different context per thread, it should be thread safe.

You can clone solvers and terms between contexts using the "translate" functions.

The main overhead with multi-threading is lock contention when accessing shared structures. The main shared structures are related to:

  1. Selected operations on large numerals.
  2. Passing in and creating symbols (strings).

Even for those two main opportunities for contention, some care has been applied to reduce overhead of global locks.

Obviously, multi-threaded applications could bloat your cache.

There is currently a bug open where a crash is claimed from such a multi-threaded use. I have not been able to reproduce this bug locally and debugging multi-threaded usage or abuse is generally very time-consuming. Otherwise, multi-threading has been used extensively elsewhere.

ytrezq commented 3 years ago

@NikolajBjorner about the overhead, I was talking about the z3.parallel option which indeed results in more overhead than benefit in my case when hyperthreading is enabled.

So the best way in my case is to parallelize at the Python level (and ask each z3 solver instance to perform their own unrelated work). But passing results between python instances (whether using multiprocessing or threads) requires serializing python objects mostly through Cpickle. And because of the internal C structures used by z3, z3 Python objects don t serialize. Rephrased, this means there s no way for Python instance (un thread or in process) to send it s anaylysis results back to the main thread.

NikolajBjorner commented 3 years ago

There is a way to exchange structure between Python threads. The 'translate' methods serialize everything. They serialize the underlying C structures. I use essentially the translate mechanisms within the parallel solver.

The parallel solver options are very unlikely to work well out of the box with the default options: On satisfiable problems, cube and conquer requires some more time in CDCL and not aggressive cubing. Even the SAT solvers can benefit from options where it runs in a longer time with sticky phases. For unsat problems, there is also a tradeoff in how much each cubing phase can benefit from simplifications. For small unsat problems (with < 10K clauses) it may be useful to cube aggressively, but for larger problems you may benefit a lot from inprocessing simplification after cubing. By default the parallel solver allocates up to number of logical cores (so up to number of hyper-threads). If this indeed kills the cache, which I have not measured, then it can be overwritten manually by setting the number of conquer threads with options as well. Use 'z3 -pm:parallel' to see options.

ytrezq commented 3 years ago

@NikolajBjorner but since it s about cpu bound, I m not using threading directly.

When using sub interpreters of Cpython 3.9 (which is not threading), there s nothing in common. Think s like using differrent Python processes where only C modules can share things like global symbols.

So I m needing the same methods which allows to serialize objects to disk and z3 doesn t support this.

NikolajBjorner commented 3 years ago

OK, if you basically "fork" or "vfork" you can share state by serializing to SMTLIB2 (print the solver state to a string and parse it). The internal data-structures of Z3 don't serialize. I have used string-based serialization for running Z3 on a few hundred Azure workers.

ytrezq commented 3 years ago

@NikolajBjorner then the problem is sending the results back.

That s why I m requesting a proper definition of __reduce__() for z3 Objects.

NikolajBjorner commented 3 years ago

That's not going to be possible: there are way too many internal data-structures that don't serialize.

ytrezq commented 3 years ago

@NikolajBjorner I m aware this is huge works. But that doesn t means everything has to be serialized. Is there instead a way to rebuild an object from much less basic info? Also since libz3 is using pure C without calling Python, things like global symbols can be shared accross interpreters (which means again not have to serialize everything but just enough to rebuild a Python object from existing pointers in memory),

Please don t tell most Python projects using z3 have to remain thingle threaded.

NikolajBjorner commented 3 years ago

I don't see a way to reasonably develop and then maintain such serialization. Z3 uses a thin python wrapper, the C++ backend is in no ways thin. The internal structures are not developed with serialization in mind.

ytrezq commented 3 years ago

@NikolajBjorner I might be wrong, but I would say simple! Every Python Object refers to a global symbols which is a struct containing an array and an integer for it s size. The array contains the C representation of z3 Objects. Each subinterpreters use only one position in the array (so the number of rows somewhat match the number of interpreter instances).

Deserializing is just constructing a Python Object wrapping the C one as it s done by currently.

That way, sharing the result of z3 works as if it was done purely in C.

NikolajBjorner commented 3 years ago

I am confused now. I believe you imply that z3 exposes methods to pickle it's internal solver and context classes. It would entail pickling classes like this: https://github.com/Z3Prover/z3/blob/master/src/sat/sat_solver.h This is a larger, but one of the more conceptually easier instances.

In ancient times, @wintersteiger developed pickling Z3 expressions to allow sharing lemmas over MPI. Given the complexity of supporting this feature and given that expressions can be written to a string and parsed, and the overhead of printing/parsing can be amortized over solving overhead, this feature was discontinued.

ytrezq commented 3 years ago

@NikolajBjorner no, nothing is pickled at the z3 C level. What I m proposing would work only using the subinterpreters accross threads (the feature found only in the latest version on the Cpython).

That way, only is the reference/pointers to the C Objects is Pickled (so it s not full serialization). So it would work without having to modify the main librarary but just the Python wrapper. And yes, this wouldn t work with multiprocessing as a result, but only with https://www.python.org/dev/peps/pep-0554/#run-isolated-code.

wintersteiger commented 3 years ago

Isn't this exactly the same as using one Z3 context in every thread (interpreted) and to translate() each of the objects where necessary? In the usual types of applications, only expressions need to be exchanged, not any of the higher level structures like models or contexts. (See also https://github.com/Z3Prover/z3/blob/master/examples/python/parallel.py).

Generic use of cPickle requires addition of reduce functions to all C objects, which is completely out of the question. If cPickle supports copy_reg, you may be able to inject your own functions, e.g. just for expressions, but this will never be complete.

ytrezq commented 3 years ago

@wintersteiger it s the same except it works cross Interpreters and cross process in the case of MS DOS (systems where all process can access the memory of each others directly). Think as serializing the result of translate(). The reason is Pickling and ipc is the only way to pass data between subinterpreters just like with multiprocessing. In the case of subinterpreters, all Python threads have their own copy of Python objects and can t access the variables of other instances which means pure python code behave like with multiprocessing with the only exception being custom C code.

It s not about having a proper Pickle implementation for all C objects, but to provide a minimal one for some objects and dummy one in the other case. Of course, this would just crash if using multiprocessing unless ensuring the pid but would work

ytrezq commented 3 years ago

For more details about my current case https://github.com/ConsenSys/mythril/issues/1260#issuecomment-548168251 https://github.com/ConsenSys/mythril/issues/1371#issuecomment-619871650

NikolajBjorner commented 3 years ago

reopening - I am still not fully tuned in. At some point I thought you just want to call "fork" or "vfork", but now it appears the use case is to convert the Python objects to 2-6 tuples with some way of recreating them as prescribed here: https://docs.python.org/3/library/pickle.html#object.__reduce__

The more streamlined way to proceed is to open a pull request with an example for say the Solver or Ast objects or the script that creates z3types.py

ytrezq commented 3 years ago

reopening - I am still not fully tuned in. At some point I thought you just want to call "fork" or "vfork", but now it appears the use case is to convert the Python objects to 2-6 tuples with some way of recreating them as prescribed here: https://docs.python.org/3/library/pickle.html#object.__reduce__

Exactly https://stackoverflow.com/a/12530553. Where the tuple content can be even dumped to disk. And since you don t seems to fully understand the new feature I m talking about nor it s doc, please read this blog post.

NikolajBjorner commented 3 years ago

I looked at the blog posts and still don't understand what is going on. So I can't add this feature myself. Contributions are possible.