fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Options of processing ROIs in parallel within wells. #44

Open jluethi opened 1 year ago

jluethi commented 1 year ago

The current vision for ROI processing (see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/27) is running all the ROIs in a well sequentially. This will be a very useful first implementation. Many operations inherently parallelize and we can run many wells in parallel.

Nevertheless, we may eventually want to parallelize some ROI processing within a well. This becomes hard when ROIs need to write to the same chunk of the zarr array, which would not be safe. But we can think of ways to handle this. I think this roughly goes in the following order:

  1. Run all ROIs sequentially (current plan to implement)
  2. Run ROIs in parallel if they are all saved to independent chunks of the zarr array. Basically, when we have a grid of field of views, we want to run some processing by field of view at level 0 (e.g. illumination correction) and we saved the field of views as chunks in the zarr file. If we can verify that this is the case, we can run the tasks in parallel.
  3. Run batches of ROIs that write to independent chunks in parallel. We may have some ROIs that need to write to the same chunk, but most ROIs don't overlap. In that case, we could check which ROIs need to write to independent zarr chunks and batch them in a clever way that groups ROIs that can be processed independently. These can then be processed in parallel.

We will implement 1 now. 2 should be fairly doable and useful. 3 is more of a potential thing we could eventually do, so we don't forget we have that option

tcompa commented 1 year ago

As of ongoing discussions with @jluethi and @mfranzon (related to #72, #27 and #75), we are now only implementing strategy 1, where all ROIs are computed sequentially (as in "define computation for a certain ROI, execute it, free up memory, move on to the next ROI").

This allows us to simplify the function that gets mapped onto the well array, and to make it such that its I/O are both numpy arrays (rather than delayed arrays).

Moving towards strategies 2 or 3 will clearly require a refactor of the relevant tasks, because within the in-progress work (see https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/5b61cd90a469e615acf56e99eaeda21fd70d31ef and #75) each ROI computation is blocking - and nothing else happens until it's over.

jluethi commented 1 year ago

Fully agree, thanks for the summary Tommaso.

When we want to tackle this eventually, we'll have to find a way to call the functions with numpy arrays, but somehow remaining delayed in this call. At the moment, the conversion of the dask region to a numpy array forces computation.

I think this sequential per well approach should be fine for quite a while, because we parallelize over the wells. I see 3 reasons when we may need to reconsider this trade-off:

  1. We want to process large OME-Zarr images, especially things that aren't in the HCS spec (=> no wells to parallelize over). That's not on our current roadmap, but eventually interesting
  2. We want to parallelize more for the GPU operations: The current implementation also means only one thing runs on a GPU at a time. If we e.g. do segmentation at level 2, we could often fit multiple jobs in a typical GPU at the same time. But the current implementation likely won't allow for that. (because sending multiple well jobs to the same GPU can become tricky)
  3. We want to optimize for processing HCS datasets with very few wells (similar to 1) => eventually interesting, but not the immediate roadmap
jluethi commented 1 year ago

Another thing to consider: I've started processing the 23 well dataset again and the parsing to OME-Zarr now seems to take about 10 hours. Looks like that is a bit slower than before. I think the biggest bottleneck is parallel IO performance, so that's not something Fractal can optimize. But given that it seems to have slowed down a bit (I remember this being in the 6 hour range before), there may be a bit of optimization potential.

One thing we could consider: Currently we're parsing all the channels sequentially. An potentially easy way to get more parallelization without having to process multiple ROIs in parallel would be to process the different channels in parallel for a given FOV.