borevitzlab / timestreamlib

DEPRECATED. Please use the current version of the TimeStream tools at https://gitlab.com/appf-anu/pyts2.
https://gitlab.com/appf-anu/pyts2
GNU General Public License v3.0
7 stars 4 forks source link

Parallelised execution in run_pipeline.py #136

Closed kdm9 closed 7 years ago

kdm9 commented 9 years ago

Joel,

Could you advice where we should specify that the timestream can be processed in parallel? We need to show the supercomputer guys that most parts execute in parallel, scaling roughly linearly with N threads.

Cheers, Kevin

Joelgranados commented 9 years ago

Hey Kevin.

We can currently parallelize while doing individual pot segmentation calculations. We don't use it and I have not tested it with real runs. So I don't know if the relation is linear to the number of threads.

With that said, I want to implement some parallelization at the pipeline level in order to execute several images at the same time. We have to be a bit careful as some components might need information from previous runs, but I think that its very "doable". This would be something to think about for the next iteration (next year).

IMO, you can write that there is a potential to parallelize each image calculation and that you estimate that the relation would be linear with the number of threads and that a machine with multiple cores would be essential to develop/test/use this type of architecture.

Don't know if that is helpful at all....

kdm9 commented 9 years ago

So something we discussed a while ago (may have been before you started with us again) was what we termed chunked parallelism. The idea being, we split the timestream n_threads ways, and process consecutive chunks in parallel. We could get around the issue of relying on the previous/next image by starting a couple of images into the chunk, then having a final step that fills in the gaps, in serial.

This would be a bit of a pain to implement, but in reality there's little point to the code if it can't be parallelised. With good design decisions, the code can be executed in a highly concurrent manner.

As a first step, lets be able to run the easily paralleised bits in parallel by images. I think I'll let you suggest where the most appropriate place for that would be, my guess would be run_pipeline.py.

kdm9 commented 9 years ago

So i've started work in the borevitzlab/feat/multiprocessing branch

The first commit removes all parallelism, and this is probably not ideal. Don't get alarmed though, I did this just to make things simple while we're hacking.

kdm9 commented 9 years ago

Responding to Joel's diff comment:

I expected as much. That commit was my work in progress, to show you what I'm up to. Ideally we will only have one level of parallelism in operation at a time, and to make life simple I removed the lower level stuff whilst we work out the higher level stuff that allows entire pipelines to be parallelised.

We will discuss how to go forward in our meeting next week

Joelgranados commented 9 years ago

As I said in my comment: You have not removed the lower level stuff. If that is your aim, please attend to the comments in the commit.

kdm9 commented 9 years ago

Will do, this is a work in progress

kdm9 commented 9 years ago

Note that this is all in an experimental branch, mostly for hacking and general exploration of parallelism, so don't get too upset if I to anything too unruly. I won't be breaking anyone else's code :smile: