Consensys / mythril

Security analysis tool for EVM bytecode. Supports smart contracts built for Ethereum, Hedera, Quorum, Vechain, Rootstock, Tron and other EVM-compatible blockchains.
https://mythx.io/
MIT License
3.87k stars 736 forks source link

Feature Request: Please support multithreading analysis! #1260

Open ytrezq opened 4 years ago

ytrezq commented 4 years ago

Even using default option myth can run during days for a single contract but always using a single thread.

Now that the official python implementation has a workaround for the gil since 3.9, is there a way for Myhril to benefit from it?

Please note there are also plans to have a faster data serializer in order to bring expected speed. So the decision shouldn’t be based on the speed for serializing objects using current api.

norhh commented 4 years ago

That sounds great. We will try to use this.thanks for the cool info

ytrezq commented 4 years ago

That also means that right now, all new code change should ensure thread safety.

ytrezq commented 4 years ago

An alternative would be to simply use the Numba package which can run threads without using the gil at all. It would also dramatically speed up single threaded mode. In threaded mode there would also not have to care about the performance penalty of creating other interpreter instance nor having to manually serialize objects constantly, and parallelization would be mostly automatic which means shorted development.

Using Numba, parallelization of mythril would mostly boils down to making thread safe and avoids relying on C code from modules.

ytrezq commented 4 years ago

A large portion of the time spent when analysing contract is actually performed in C code in the Z3 solver. For example, when analysing 0xbb9bc244d798123fde783fcc1c72d3bb8c189413 using -t 1 limit from mainnet, for 2779 queries completing in 2200 seconds on my machine, it took 129 seconds in Z3_optimize_check with only 981 calls and 1798 calls to Z3_solver_check_assumptions taking 449 seconds (and apparently without calling anything else in both cases).

@norhh so the question: is there no way to do something else not depending on the result while those functions are called? Or since I’m seeing often a thread being spawn it’s already happening in multithread mode?

norhh commented 4 years ago

Yes, it was possible. But we implemented multithreading previously but there was a slowdown due to GIL. We can't exactly use multiprocessing as Z3 isn't serialisable friendly. Switching to other versions might help with multi threading.

ytrezq commented 4 years ago

@norhh I was talking regardless of the Gil of course. Please note Numba doesn’t rely on serialization and can run even old Python versions.

So what can be done while z3 works on Z3_optimize_check or Z3_solver_check_assumptions? Something that doesn’t access the Python api if possible.

Anyway may I take a look on the useless multithreading work (since it didn’t worked) you had to remove from codebase?

norhh commented 4 years ago

We didn’t put it in the codebase. What we did was to parallely run the analysis modules

JoranHonig commented 4 years ago

@ytrezq The previous approach to multithreading was based on a previous design of laser and mythril, with the current architecture multithreaded symbolic execution would look a little bit different.

Right now the process can generally be seen like this:

We can parallelize this by having multiple worker threads. (there will need to be some concurrency control added for some of the features & plugins in laser)

I'm not sure what the limitations of Numba are, so I'm not sure how that affects our ability to paralellize things.

ytrezq commented 4 years ago

@JoranHonig in the case of Numba here are the 2 limitations:

That’s it! and then the process to parallelize is just to add this function decorator: @njit(cache=True,parallel=True,nopython=True,nogil=True) before function declaration and it will automatically vectorize it and parallelize it (taking care of thread safety). And as the name jit and nopython=True implies, even if the result can’t be vectorized nor parallelized, it will run far faster than in normal mode. Numba is even compatible with Python3.4 which mean this approach would work right now.

And basically, the implemented official gil workaround which can be tested right now in the latest version works exactly like multiprocess mode except it removes the ipc overhead (programming is definitely the same as with using the multiprocess module).

But what I’m seeing is between 20% to 55% of the time spent in the main thread is performed in the solver. That time spent doesn’t requires the python interpreter at all which means it doesn’t even care about what the gil is. We already have a part which is performed in a separate thread (I failed to find which one but it represent a 10% increase over single thread mode) from time to time even if the program is bound to main thread speed, so why not decrease a one day analysis time by 7 to 10 hours by spawning works which isn’t related to the solver off the main thread (leaving solver work in the main thread). Of course this represents a different way than the one you’re suggesting.

Also z3 has an option for running it’s C code in parallel. This option is of course supported by the official python binding. What changes in mythril code would be required to support it?

ytrezq commented 4 years ago

@norhh yes I known the codebase is no longer related. But neverless I’m interested about getting more details about in which functions and files it was done.

ytrezq commented 4 years ago

@norhh any chance this get into mythx?

norhh commented 4 years ago

@ytrezq We are currently working on other issues, we will inform after it's decided.

ytrezq commented 3 years ago

@norhh might be a good solution for your servers which would allow the previous attempt to just work. I found a gil free version of Python3.6 https://github.com/larryhastings/gilectomy