Open tcompa opened 1 year ago
Very good overview @tcompa
The perspective is that we will handle arrays with mixed dimensions, which can be up to 5D (TCZYX) but also lack some of the intermediate channels (like TCYX)
Actually, arrays can be n-dimensional. We always expect YX to be there. Anything else is optional. There will often be Z (though not always, we'll need to make the 2D only case work as well, see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/124). There often will be multiple channels (those typically can just be looped over) and there may be time information (sometimes to be looped over, i.e. process timepoint by timepoint, e.g. for segmentation. Some other times we'll need to process whole time series at once, e.g. to do tracking). And users may come up with extra dimensions at some point. We don't need to support processing those as long as we don't have clear use cases for them, but in an optimal case, we should fail when we get such OME-Zarr files / it should be easy to adapt a task to them.
- Have some custom handling of the dimensionality in the zarr-creation tasks.
That seems good to me. We can be somewhat conservative in adding dimensions. Let's make sure 2D only (https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/124) and time data (https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/169) can be parsed, but hold off on more complex logic.
create_zarr_structure and yokogawa_to_zarr would include more logic
=> Sounds good to me. Let's add complexity where needed for the two issues above. I'll work on small test sets. The 2D is ready, the time one I will need to look into.
Consistently use named axis in all other tasks.
The seems like a very good approach to make sure we're stable when users start introducing different dimensions, when we only have specific ones.
Make sure that the relevant functions/tasks are capable of handling arrays of different shapes
Lets: a) Find a good way to define what input a task can handle, e.g. in its docstring b) Let's make sure the tasks then run on the different shapes they are supposed to work + explicitly load them
It could be a bit trickier with dask arrays
Good point. But our current approach should scale quite a while, I hope. Let's re-asses this if it becomes necessary
A lot of discussion is ongoing in:
Adding to this issue, work in https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/557/files introduces the functions get_single_image_ROI
and get_image_grid_ROIs
which (in the current versions) do require a set of ZYX pixel sizes.
These are obtained through the NgffImageMeta.pixel_sizes_zyx
property, which is setting the Z pixel size to 1 if the corresponding channel is missing - and for this reason the import-ome-zarr task remains flexible.
In the future, also these new functions will need to be made more flexible (that is, they should not always require the Z pixel size).
At the moment all our image arrays are 4D (CZYX) and each one of our label arrays is 3D (ZYX). This property is visible in the
.zarray
files, and in the folder structure. When the dimension along Z is dummy (a single Z plane), we still use the 4D/3D structure, with shape like(num_channels, 1, num_y, num_x)
or(1, num_y, num_x)
. Also ROIs are defined in the same way: they are always 3D shapes (defined by 6 numbers), and in some cases the Z part is dummy (starting at 0 and ending atpixel_size_z
, corresponding to a single pixel).The perspective is that we will handle arrays with mixed dimensions, which can be up to 5D (TCZYX) but also lack some of the intermediate channels (like TCYX), see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/149#issuecomment-1289379988:
Broadly speaking, a possible (preliminary!) plan to support this general case would be to
Re: point 1 This means that
create_zarr_structure
andyokogawa_to_zarr
would include more logic, to choose the right structure of the target zarr array. This may include something like explicit user-provided parameters on the structure one should expect, or inference from the metadata if that's sufficiently robust. As always, the simplest is to have a couple of small test folders with different cases (e.g. CZYX, TCZYX, TCYX, and YX?)Re: point 2 This may be a bit complex, but the nice advantage is that we would be moving even closer to OME-NGFF specs. Note that sometimes we already have to specify named axes in the OME-NGFF metadata, e.g. in https://github.com/fractal-analytics-platform/fractal-tasks-core/blob/f85f88032701f06df3ee7ac3ddcf6941540a005f/fractal_tasks_core/napari_workflows_wrapper.py#L204-L215
Re: point 3 It should not be too challenging for functions with numpy arrays as inputs/outputs (thanks to broadcasting rules). It could be a bit trickier with dask arrays, but my feeling is that we are currently moving towards a direction where dask is mostly used to lazily-load arrays and organize the processing of several small parts (note that this could change, e.g. if we push towards in-task ROI-parallelization, and we may need to depend more heavily on dask arrays.. to be assessed).