Closed fstur closed 11 months ago
Attention: 17 lines
in your changes are missing coverage. Please review.
Comparison is base (
7594dc6
) 96.04% compared to head (7346116
) 96.39%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Hi @fstur,
Thanks for all the effort you have already put into this!
I would like to merge this functionality soon and wanted to ask if you have time to finish the PR or if I should continue working on it?
This pull request has been mentioned on Image.sc Forum. There might be relevant details there:
Hi @tibuch, I should have time to work on it this thursday or friday.
Hi @tibuch, I added the tests for MetaSeriesUtils_dask, so it should be ready from my side. Maybe have a look at the tests and see if they makes sense, or if we need to test for other things as well.
Awesome @fstur! I will have a look on Monday.
Thanks a lot for all the work you put in!
Hi @fstur,
I am looking through the PR and just wanted to confirm the usage pattern.
In the 2D case we would do the following:
raw_data_da = read_FCYX(well_Files, channels, ny, nx, dtype)
well_image = fuse_dask(raw_data_da, positions, fuse_mean)
zarr_well[:] = well_image
As far as I can tell well_image
is a numpy array with all the well-data loaded into memory. Is this correct?
Hi @tibuch,
Not quite: well_image
is still a dask-array with nothing yet in memory. This can then be passed as-is to the ome-zarr writer: ome_zarr.writer.write_image(well_image, ...)
. The writer will then handle loading of the dask-array chunks and writing them to the zarr file. It should not need to load the entire well_image, but only chunk by chunk. In the 2D case one chunk would be one channel, i.e. chunks=(1,ny,nx), and in the 3D case one plane and one channel, i.e. chunks=(1,1,ny,nx).
But doesn't this call create a full array (single channel, single plane):
im_fused = np.zeros((len(tiles), ny_tot, nx_tot), dtype=tiles.dtype)
And then in _fuse_xy
:
ims_fused = np.empty(x.shape[1:-2] + (ny_tot, nx_tot), dtype=x.dtype)
for i in np.ndindex(x.shape[1:-2]):
slice_tuple = (slice(None),) + i # workaround for slicing like [:,*i]
ims_fused[i] = assemble_fun(x[slice_tuple], positions)
Multiple planes (if available) are combined into a single channel image.
Yes, but the trick is, that we only call _fuse_xy
with da.map_blocks
. da.map_blocks
will take a dask array (e.g. raw_data_da
in your example) and return another dask array (well_img
in your example) with _fuse_xy
applied lazily. So only when a chunk of well_img
is loaded (e.g. through the ome-zarr writer), is _fuse_xy
called.
I wrote _fuse_xy
in a general way, such that also larger chunks can be processed (or even the entire well). But the chunk-size is defined in read_FCZYX
, such that only one plane and channel is read.
I understand how da.map_blocks
is used to process channels independently and also planes. But the individual planes are always full size in YX. Which means this works as long as we can hold a single plane (or stack) of a single channel in memory.
Yes, we need to be able to hold one plane in memory, but not an entire stack, since each chunk should only contain one plane.
Do you think this is acceptable, or should we think about also chunking planes?
I would like to see how difficult it would be to add chunking for planes. If you don´t mind I would add it to this branch and get your feedback.
@fstur there is now another iteration of tile stitching with dask available. It also uses the da.map_blocks
function, but utilizes the block_info
argument to assemble a chunk.
The block_info
dictionary contains information about the position and shape of the current block/chunk inside the output array. This position can be used to retrieve all tiles which are contributing to this chunk.
The DaskTileStitcher
takes a list of Tile
s, where each tile knows its position in the final stitched image. From this list a dictionary is created which maps chunk-positions to tiles: https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/DaskTileStitcher.py#L40-L59
The stitching is then done with da.map_blocks
: https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/DaskTileStitcher.py#L97-L103
With the stitching_utils.assemble_chunk
function: https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/stitching_utils.py#L83-L101
The assemble_chunk
function knows which chunk it is currently processing and retrieves the tiles that contribute to it. The tiles are then warped (currently simple xy-translation) and handed over to a tile-fusing function which does the blending.
https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/stitching_utils.py#L51-L80
https://github.com/fmi-faim/faim-hcs/blob/e6b2c520338957c9adb47a1e3b539c98a7a6e2d3/src/faim_hcs/stitching/stitching_utils.py#L10-L29
The chunks are hard-coded to have z-size of 1, but can have an arbitrary size in yx. Theoretically, it should also be possible to have an arbitrary size in z, but I had some troubles getting the transformations and warping in scikit-image to work.
What do you think about this approach?
Looks great! I will have a closer look as soon as I have time. Thanks!
Added MetaSeriesUtils_dask.py with functionality to take a dask-array and assemble its planes, returning another dask-array