fusion of dask-array - Githubissues

fstur commented 1 year ago

Added MetaSeriesUtils_dask.py with functionality to take a dask-array and assemble its planes, returning another dask-array

codecov[bot] commented 1 year ago

Codecov Report

Attention: 17 lines in your changes are missing coverage. Please review.

Comparison is base (7594dc6) 96.04% compared to head (7346116) 96.39%.

Files	Patch %	Lines
src/faim_hcs/MetaSeriesUtils_dask.py	89.92%	13 Missing :warning:
src/faim_hcs/stitching/Tile.py	92.68%	3 Missing :warning:
src/faim_hcs/stitching/stitching_utils.py	97.67%	1 Missing :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #57 +/- ## ========================================== + Coverage 96.04% 96.39% +0.34% ========================================== Files 16 27 +11 Lines 1290 1884 +594 ========================================== + Hits 1239 1816 +577 - Misses 51 68 +17 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

tibuch commented 1 year ago

Hi @fstur,

Thanks for all the effort you have already put into this!

I would like to merge this functionality soon and wanted to ask if you have time to finish the PR or if I should continue working on it?

imagesc-bot commented 1 year ago

This pull request has been mentioned on Image.sc Forum. There might be relevant details there:

https://forum.image.sc/t/outcomes-of-the-next-generation-bioimage-analysis-workflows-hackathon/88733/1

fstur commented 1 year ago

Hi @tibuch, I should have time to work on it this thursday or friday.

fstur commented 1 year ago

Hi @tibuch, I added the tests for MetaSeriesUtils_dask, so it should be ready from my side. Maybe have a look at the tests and see if they makes sense, or if we need to test for other things as well.

tibuch commented 1 year ago

Awesome @fstur! I will have a look on Monday.

Thanks a lot for all the work you put in!

tibuch commented 1 year ago

Hi @fstur,

I am looking through the PR and just wanted to confirm the usage pattern.

In the 2D case we would do the following:

raw_data_da = read_FCYX(well_Files, channels, ny, nx, dtype)
well_image = fuse_dask(raw_data_da, positions, fuse_mean)
zarr_well[:] = well_image

As far as I can tell well_image is a numpy array with all the well-data loaded into memory. Is this correct?

fstur commented 1 year ago

Hi @tibuch, Not quite: well_image is still a dask-array with nothing yet in memory. This can then be passed as-is to the ome-zarr writer: ome_zarr.writer.write_image(well_image, ...). The writer will then handle loading of the dask-array chunks and writing them to the zarr file. It should not need to load the entire well_image, but only chunk by chunk. In the 2D case one chunk would be one channel, i.e. chunks=(1,ny,nx), and in the 3D case one plane and one channel, i.e. chunks=(1,1,ny,nx).

tibuch commented 1 year ago

But doesn't this call create a full array (single channel, single plane):

im_fused = np.zeros((len(tiles), ny_tot, nx_tot), dtype=tiles.dtype)

And then in _fuse_xy:

ims_fused = np.empty(x.shape[1:-2] + (ny_tot, nx_tot), dtype=x.dtype)
for i in np.ndindex(x.shape[1:-2]):
    slice_tuple = (slice(None),) + i  # workaround for slicing like [:,*i]
    ims_fused[i] = assemble_fun(x[slice_tuple], positions)

Multiple planes (if available) are combined into a single channel image.

fstur commented 1 year ago

Yes, but the trick is, that we only call _fuse_xy with da.map_blocks. da.map_blocks will take a dask array (e.g. raw_data_da in your example) and return another dask array (well_img in your example) with _fuse_xy applied lazily. So only when a chunk of well_img is loaded (e.g. through the ome-zarr writer), is _fuse_xy called. I wrote _fuse_xy in a general way, such that also larger chunks can be processed (or even the entire well). But the chunk-size is defined in read_FCZYX, such that only one plane and channel is read.

tibuch commented 1 year ago

I understand how da.map_blocks is used to process channels independently and also planes. But the individual planes are always full size in YX. Which means this works as long as we can hold a single plane (or stack) of a single channel in memory.

fstur commented 1 year ago

Yes, we need to be able to hold one plane in memory, but not an entire stack, since each chunk should only contain one plane.

fstur commented 1 year ago

Do you think this is acceptable, or should we think about also chunking planes?

tibuch commented 1 year ago

I would like to see how difficult it would be to add chunking for planes. If you don´t mind I would add it to this branch and get your feedback.

tibuch commented 1 year ago

@fstur there is now another iteration of tile stitching with dask available. It also uses the da.map_blocks function, but utilizes the block_info argument to assemble a chunk.

The block_info dictionary contains information about the position and shape of the current block/chunk inside the output array. This position can be used to retrieve all tiles which are contributing to this chunk.

The DaskTileStitcher takes a list of Tiles, where each tile knows its position in the final stitched image. From this list a dictionary is created which maps chunk-positions to tiles: https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/DaskTileStitcher.py#L40-L59

The stitching is then done with da.map_blocks: https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/DaskTileStitcher.py#L97-L103

With the stitching_utils.assemble_chunk function: https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/stitching_utils.py#L83-L101

The assemble_chunk function knows which chunk it is currently processing and retrieves the tiles that contribute to it. The tiles are then warped (currently simple xy-translation) and handed over to a tile-fusing function which does the blending. https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/stitching_utils.py#L51-L80 https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/stitching_utils.py#L10-L29

The chunks are hard-coded to have z-size of 1, but can have an arbitrary size in yx. Theoretically, it should also be possible to have an arbitrary size in z, but I had some troubles getting the transformations and warping in scikit-image to work.

What do you think about this approach?

fstur commented 12 months ago

Looks great! I will have a closer look as soon as I have time. Thanks!

fmi-faim / faim-ipa

fusion of dask-array #57

Codecov Report