eqasim-org / synpp

Synthetic population pipeline code for eqasim
http://www.eqasim.org
MIT License
18 stars 12 forks source link

Run stages in parallel #34

Open sebhoerl opened 4 years ago

sebhoerl commented 4 years ago

Often, the tree structure of the pipelines allows to run things in parallel. Right now the pipeline runs one stage at a time. To make use of parallel computing power, a couple of steps are necessary:

1) Let user define resource availability via configuration, e.g.

resources:
  - cpu: 8
  - memory: 10

2) Let user define resource requirements, e.g.

def configure(context):
  context.resource("cpu", 4)

3) Run stages in parallel in the pipeline. There is a caveat: We can not start slave processes from within slave processes. This means if a stage makes use of the parallel() context, it should not already be in a child process! Therefore, we need to put some thoughts and intelligent management of the process pool. (In particular, it would need to be managed centrally by the pipeline instead of per ParallelMasterContext object).

ainar commented 1 year ago

I managed that in one of my forked branches: https://github.com/eqasim-org/synpp/compare/develop...ainar:synpp:multiprocessing

I encompassed each stage execution in a multiprocessing's Process. The processes are launched as soon as enough resources are available and the executions of their dependencies are done.

In my fork, the total amount of resources and the amount of resources needed for each stage still need to be configurable.