cadCAD-org / cadCAD

Design, simulate, validate, and operate within complex systems
https://cadcad.org
MIT License
552 stars 272 forks source link

Multiprocessing runs 100X+ slower than single simulation with certain simulations #365

Open SeanMcOwen opened 23 hours ago

SeanMcOwen commented 23 hours ago

This notebook shows a current simulation model that runs very slow when moving to multi-processing.

The results are:

No deepcopy and 1 Monte Carlo run: 0.12 seconds No deepcopy and 5 monte carlo runs: 60.22 seconds / 5 = 12.04 seconds Deepcopy and 1 monte carlo: 2.79 seconds Deepcopy and 5 monte carlo runs: 59 seconds /5 = 11.90 Seconds

In a table then:

Single Proc Multi Proc
No Deepcopy .12 12.04
Deepycopy 2.79 11.90

So we can see that on single simulations turning off deep copy speeds up a lot but no matter what in mutli-processing we run massively slower. Given that deepcopy has no effect it looks like it doesn't get triggered with multi-proc BUT it still runs much slower from other things.

Further context from @danlessa is:

"Regarding multiprocessing, those two threads have some context 2023-12 on [client project]: https://blockscienceteam.slack.com/archives/C05LRRUMGQM/p1703034551508919?thread_ts=1703019991.737059&cid=C05LRRUMGQM 2020-12, on using multiprocessing alternatives: https://blockscienceteam.slack.com/archives/CCYHUBHJ7/p1609220349006200"

The important information from the slack thread is: "as for the single thread result: this is related to how multi-processing in Python works. Processes cannot share memory directly, and they rely on IPC, which involves serialization of data. This is an expensive operation when dealing with objects generally. cadCAD uses pathos for parallelizing runs, which in turn depends on dill as a serializer. dill is particularly slow when compared to pickle as a serializer, however it can handle pretty much any kind of object, while pickle cannot. This is a problem without an easy and universal way out. Most performance improvements requires constraining use cases in some direction. If you're looking up for 10-100x speed-ups, then investing in an non-deepcopy compatible solution can definitely pay out, as it opens you the possibility of doing some clever hacks (like history erasure, which facilitates the serialization a lot)"

linear[bot] commented 23 hours ago

CORE-126 Multiprocessing runs 100X+ slower than single simulation with certain simulations