equinor / fmu-ensemble

Python objectification of reservoir model ensembles left on disk by ERT.
GNU General Public License v3.0
12 stars 19 forks source link

Multiprocessing #11

Open berland opened 5 years ago

berland commented 5 years ago

Operations over an ensembles are trivially parallelizable.

We should utilize Python multiprocessing for this.

multiprocessing is what should be used, as multithreading will suffer from GIL.

This is probably trivial for ensemble.get_smry(), but not so trivial for ensemble.from_smry(), as we need to populate each realization object with smry data in the parent process' memory space.

Maybe ensemble.from_smry() should call realization.get_smry() with multiprocessing, and then the ensemble object (holding the master process) populates each realizations self.data['unsmry-<something>'].

We must ensure CTRL-C works, which is trickier with Multiprocessing.

See this: https://stackoverflow.com/questions/11312525/catch-ctrlc-sigint-and-exit-multiprocesses-gracefully-in-python

When this is in place, we should also be able to skip issues when libecl is core-dumping due to a difficult UNSMRY-file.

Right now, your Python session will die if libecl crashes on rough data.

berland commented 4 years ago

concurrent.futures should be used for this. Needs a backport for Python 2.7.

wouterjdb commented 4 years ago

Would it be an idea to not support Python 2.7 (just leave the old code in place when running Python 2.7) and only build this for Python3?

berland commented 4 years ago

77 has a good start for concurrent initialization of objects. It also uncovers that the usage pattern of initializing Realization objects and then asking them do update themselves is not well suited for concurrent runs, as pickling and depickling realization objects back and forth for every operation do not scale.

A suggestion could be to allow for more processing in a realization to happen at time of object initialization. It might be possible to pass a dict with names of realization function call as keys, and with (list of) function arguments as dict values, which can be passed to __init__, and that would enable calling each necessary load_* function concurrently. __init__ in a realization would use a "batch_processor" in the realization object that can also serve as a general wrapper for later concurrent operations, and this function should return the realization object when finished, to be compatible with concurrent.future.

berland commented 4 years ago

Batch processor in #78

berland commented 4 years ago

106 is ready as an implementation of this issue. Speedup is still disappointingly low, and is effectively holding back merging into master.