Open jluethi opened 2 years ago
As of ongoing discussions with @jluethi and @mfranzon (related to #72, #27 and #75), we are now only implementing strategy 1, where all ROIs are computed sequentially (as in "define computation for a certain ROI, execute it, free up memory, move on to the next ROI").
This allows us to simplify the function that gets mapped onto the well array, and to make it such that its I/O are both numpy arrays (rather than delayed arrays).
Moving towards strategies 2 or 3 will clearly require a refactor of the relevant tasks, because within the in-progress work (see https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/5b61cd90a469e615acf56e99eaeda21fd70d31ef and #75) each ROI computation is blocking - and nothing else happens until it's over.
Fully agree, thanks for the summary Tommaso.
When we want to tackle this eventually, we'll have to find a way to call the functions with numpy arrays, but somehow remaining delayed in this call. At the moment, the conversion of the dask region to a numpy array forces computation.
I think this sequential per well approach should be fine for quite a while, because we parallelize over the wells. I see 3 reasons when we may need to reconsider this trade-off:
Another thing to consider: I've started processing the 23 well dataset again and the parsing to OME-Zarr now seems to take about 10 hours. Looks like that is a bit slower than before. I think the biggest bottleneck is parallel IO performance, so that's not something Fractal can optimize. But given that it seems to have slowed down a bit (I remember this being in the 6 hour range before), there may be a bit of optimization potential.
One thing we could consider: Currently we're parsing all the channels sequentially. An potentially easy way to get more parallelization without having to process multiple ROIs in parallel would be to process the different channels in parallel for a given FOV.
cc @lorenzocerrone on this issue. Will be something that we eventually cover in the OME-Zarr reader/writer class :)
The current vision for ROI processing (see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/27) is running all the ROIs in a well sequentially. This will be a very useful first implementation. Many operations inherently parallelize and we can run many wells in parallel.
Nevertheless, we may eventually want to parallelize some ROI processing within a well. This becomes hard when ROIs need to write to the same chunk of the zarr array, which would not be safe. But we can think of ways to handle this. I think this roughly goes in the following order:
We will implement 1 now. 2 should be fairly doable and useful. 3 is more of a potential thing we could eventually do, so we don't forget we have that option