dimension slicing error

martinschorb commented 5 months ago

Hi @constantinpape

I am trying to implement the slicing for https://github.com/mobie/mobie-utils-python/pull/130

and get stuck in the Downscaling workflow.

File "code/cluster_tools/cluster_tools/copy_volume/copy_volume.py", line 151, in run_impl
    f.require_dataset(self.output_key, shape=out_shape, chunks=chunks,
  File "site-packages/z5py/group.py", line 232, in require_dataset
    return Dataset._require_dataset(self, name, shape, dtype, chunks,
  File "site-packages/z5py/dataset.py", line 155, in _require_dataset
    return cls._create_dataset(group, name, shape, dtype, data=data,
  File "site-packages/z5py/dataset.py", line 198, in _create_dataset
    raise RuntimeError("Chunks %s must have same length as shape %s" % (str(chunks),
RuntimeError: Chunks (3, 64) must have same length as shape (3, 128, 128)

(3, 128, 128) is the shape of my input file. I try to get only one channel out. Do I still need to provide chunks as (1, 64, 64)? This will then create 3D output data instead of the desired 2D.

That's what causes it to fail here: https://github.com/mobie/mobie-utils-python/actions/runs/7845703615/job/21410747881?pr=130

Here are the configs for the luigi run:

configs.zip

How do I need to specify it correctly? Or is there something wrong in handling the ROI parameters? I could not spot something obvious...

martinschorb commented 5 months ago

roi_begin and roi_end are part of the global config and look OK to me.

martinschorb commented 5 months ago

I think this extends to a more general question of how to convert multi-dimensional stacks properly. I could not spot a parameter for the copy_volume task or in https://github.com/constantinpape/elf/blob/962c10c8c3db87814eaca6c79c5f1a0a6a1f491c/elf/io/image_stack_wrapper.py#L17 that would define the axis order for the output NGFF. Do you always assume that input data (as TIF stack) is zxy? Or do we leave that up to the user and rather ask them to use another converter (like https://github.com/glencoesoftware/bioformats2raw)?

martinschorb commented 5 months ago

Also it is not very clear to me where in the procedure an existing singleton dimension should be "squeezed". I think the most elegant approach would be to keep all dimensions in the resulting NGFF output and have the reader/Viewer take care.

constantinpape commented 5 months ago

Hi @martinschorb , I had a quick look, and indeed the problem is that the singleton dimension is currently not squeezed. We would need to add additional functionality here to squeeze out the singleton. In principle this could be done, but I am not sure if we should go for it.

I think the most elegant approach would be to keep all dimensions in the resulting NGFF output and have the reader/Viewer take care.

I agree that this is the most elegant approach, and this should be fully supported by MoBIE already with the channel attribute. Can you use this instead of implementing the channel extraction here?

martinschorb commented 5 months ago

Yes, let's assume the reader/viewer takes care of the squeezing. However, it would need correct information about the dimensions from the OME-Zarr. Where in copy_volume can the axis types/labels be specified? Especially TIF stacks can have zct in different axes and it would be nice to batch convert those using mobie_python as well.

My workaround for channels right now is using the np.array input to mobie.add_image. This can help also with non-standard file extensions etc. But having a less hacky way of specifying axes/dimensions would be great.

constantinpape commented 5 months ago

Especially TIF stacks can have zct in different axes and it would be nice to batch convert those using mobie_python as well.

In general: you will need to bring the input data to the following order: tczyx (where missing axes can be omitted). I don't want to deal with shuffling around axes here; this will complicate things massively.

My workaround for channels right now is using the np.array input to mobie.add_image. This can help also with non-standard file extensions etc.

I think that is a good solution. As said above I think shuffling around axes in the code here is not a good idea, so it's best if you can take of this beforehand. (This may be difficult if the input data is very large, but for tifs that is probably not the case).

What we however still need is a way to specify the axis labels for the ome.zarr. (Since we may have zyx, tcyx etc.) I am not sure if this is working correctly right now, but it's relatively easy to add, see #45 .

constantinpape / cluster_tools

dimension slicing error #43