fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
14 stars 6 forks source link

Making `coordinateTransformations` flexible for axes & comply with spec (include channels transformation) #420

Closed jluethi closed 1 year ago

jluethi commented 1 year ago

This is about whether we're putting the right number of elements into the coordinateTransformations in the OME-Zarr and reading them out flexibly.

I have been working more with the OME-Zarr files from the MD converter and ran into an issue with our MIP conversion, where the convert_ROI_table_to_indices function in lib_regions_of_interest assumes all coordinateTransformations will always be z, y, x.

There are always 3 entries in our OME-Zarr files, see this example:

{
    "multiscales": [
        {
            "axes": [
                {
                    "name": "c",
                    "type": "channel"
                },
                {
                    "name": "z",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "y",
                    "type": "space",
                    "unit": "micrometer"
                },
                {
                    "name": "x",
                    "type": "space",
                    "unit": "micrometer"
                }
            ],
            "datasets": [
                {
                    "coordinateTransformations": [
                        {
                            "scale": [
                                1.0,
                                0.1625,
                                0.1625
                            ],
                            "type": "scale"
                        }
                    ],
                    "path": "0"
                },

This example does have 4 axes though. Thus, according to the spec (both latest and v0.4), we should have a transformation for all 4 axes, i.e. in this case one for c, z, y, x.

@tcompa Any idea why we have only 3 entries in our coordinate transformations? In some early issues, I found examples where we were looking at 4 entries (e.g. here https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/25). But shortly thereafter, we already had examples with 3: https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/23

As part of handling axes more flexibly:

jluethi commented 1 year ago

An issue that will appear in some tasks while we have this problem:

ValueError                                Traceback (most recent call last)
Cell In[8], line 18
     14 full_res_pxl_sizes_zyx = extract_zyx_pixel_sizes(
     15     f"{zarrurl_old}/.zattrs", level=0
     16 )
     17 # Create list of indices for 3D FOVs spanning the entire Z direction
---> 18 list_indices = convert_ROI_table_to_indices(
     19     FOV_ROI_table,
     20     level=0,
     21     coarsening_xy=coarsening_xy,
     22     full_res_pxl_sizes_zyx=full_res_pxl_sizes_zyx,
     23 )

Cell In[5], line 32, in convert_ROI_table_to_indices(ROI, full_res_pxl_sizes_zyx, level, coarsening_xy, cols_xyz_pos, cols_xyz_len, reset_origin)
     30 # Set pyramid-level pixel sizes
     31 print(full_res_pxl_sizes_zyx)
---> 32 pxl_size_z, pxl_size_y, pxl_size_x = full_res_pxl_sizes_zyx
     33 prefactor = coarsening_xy**level
     34 pxl_size_x *= prefactor

ValueError: too many values to unpack (expected 3)

A very manual workaround is to use full_res_pxl_sizes_zyx[-3:] instead of full_res_pxl_sizes_zyx to cut off additional entries when we load the transformations in the mip task. But clearly that isn't the actual way we want to go.

jluethi commented 1 year ago

Actually, I added a workaround that avoids running into crashes in our tasks to the extract_zyx_pixel_sizes function for the moment: It only returns the last 3 entries of the coordinateTransformation list. This isn't the general solution to actually have correct metadata. But extract_zyx_pixel_sizes anyway should only return zyx scaling, not other scaling.

This is no part of https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/403. All tests still pass. With this workaround, we can process 3D MD data. And our normal examples (e.g. example 01) also still run.

jluethi commented 1 year ago

There was a second place where we currently have the assumption that coordinateTransformations always consist of z, y, x in the lib_zattrs_utils: The rescale_datasets. I now switch this such that it also works with czyx (or tczyx). But we should of course make this more general. This fix is just so that our tasks don't break with OME-Zarrs with czyx coordinateTransformations.

tcompa commented 1 year ago

There was a second place where we currently have the assumption that coordinateTransformations always consist of z, y, x in the lib_zattrs_utils: The rescale_datasets. I now switch this such that it also works with czyx (or tczyx). But we should of course make this more general. This fix is just so that our tasks don't break with OME-Zarrs with czyx coordinateTransformations.

I made this slightly more general with https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/403/commits/63e04d2b4c8ddb204edc478e0faae43f6205e41f, let me know if it makes sense to you.

jluethi commented 1 year ago

@tcompa This indeed looks like a more general solution. I'll try to test it this afternoon. I think it should work, but there are some edge cases where we e.g. were relying on the fact that the scale arrays have 3 digits, which is not always the case.

But if this works, it would already be a good step towards dropping this assumption.

tcompa commented 1 year ago

I think it should work, but there are some edge cases where we e.g. were relying on the fact that the scale arrays have 3 digits

Shall we take this opportunity to make everything a bit more consistent? For the moment that means always having 4 (CZYX) items in any scale transformation (which will then be superseded by #150).

The most relevant modules dealing with scale transformations are:

  1. lib_zattrs_utils.py, which is currently under control (even if the one in extract_zyx_pixel_sizes is clearly a workaround)
  2. The create-ome-zarr tasks, where we are currently writing scale transformation of length 3. I will now switch to always including a 1 for the channel axis, in the scale transformation, since for the moment we always write CZYX arrays.

I cannot easily find other relevant modules to look at, but I can try to set up a couple more tests to spot possible issues:

$ git grep scale fractal_tasks_core | grep -v upscale | grep -v multiscale | grep -v escale | grep scale 
fractal_tasks_core/lib_zattrs_utils.py:            if t["type"] == "scale":
fractal_tasks_core/lib_zattrs_utils.py:                pixel_sizes = t["scale"]
fractal_tasks_core/lib_zattrs_utils.py:            f" no scale transformation found for level {level}"
fractal_tasks_core/lib_zattrs_utils.py:    Given a set of datasets (as per OME-NGFF specs), update their "scale"
fractal_tasks_core/lib_zattrs_utils.py:            if t["type"] == "scale":
fractal_tasks_core/lib_zattrs_utils.py:                new_t["scale"][-2] = new_t["scale"][-1] * prefactor
fractal_tasks_core/lib_zattrs_utils.py:                new_t["scale"][-1] = new_t["scale"][-1] * prefactor
fractal_tasks_core/tasks/create_ome_zarr.py:                                    "type": "scale",
fractal_tasks_core/tasks/create_ome_zarr.py:                                    "scale": [
fractal_tasks_core/tasks/create_ome_zarr_multiplex.py:                                    "type": "scale",
fractal_tasks_core/tasks/create_ome_zarr_multiplex.py:                                    "scale": [
tcompa commented 1 year ago

@tcompa Any idea why we have only 3 entries in our coordinate transformations?

I have no idea, TBH.

The create-ome-zarr tasks, where we are currently writing scale transformation of length 3. I will now switch to always including a 1 for the channel axis, in the scale transformation, since for the moment we always write CZYX arrays.

See a0ab7e7

jluethi commented 1 year ago

Shall we take this opportunity to make everything a bit more consistent?

Yes, that's great! I was a bit hesitant about opening too broad a topic here. But this sounds like we can tackle it in a contained manner, generalize our processing to always create the CZYX coordinate transformations and then test that this does not interfere with our existing workflows.

If this more general way works well for the existing examples we have, I'd be in favor of merging it into the 0.10.0 release candidates. I'll need to update the MD parsing tasks to be 0.10.0 (& server 1.3.0) compatible and then test them with these new changes. But I'd open new issues if we then run into problems there

tcompa commented 1 year ago

Yes, that's great! I was a bit hesitant about opening too broad a topic here. But this sounds like we can tackle it in a contained manner, generalize our processing to always create the CZYX coordinate transformations and then test that this does not interfere with our existing workflows.

Even if we don't go too far in flexibility (and for sure we are not going to address #150 now), at least we should be aware of how things are now. I've added a unit test for lib_zatts_utils.py functions, which is meant as a showcase of how things work now.

and then test that this does not interfere with our existing workflows.

I'll need to update the MD parsing tasks to be 0.10.0 (& server 1.3.0) compatible and then test them with these new changes. But I'd open new issues if we then run into problems there

The CI of #403 is passing (locally), but it's always best to double check things when running an actual workflow. And I have started to update things in https://github.com/fractal-analytics-platform/fractal-demos/pull/54, so that testing examples (at least the 01 one) should be relatively smooth. I'll also make frequent fractal-tasks-core prereleases, so that we can speed up testing.

If this more general way works well for the existing examples we have, I'd be in favor of merging it into the 0.10.0 release candidates.

I still need to review the tasks part of #403, but for the moment I have nothing to add about the transformations part. I'd push for having it in 0.10.0, for sure.

tcompa commented 1 year ago

Note that the zarr examples on zenodo also have the wrong number of items in the scale transformation. This is currently addressed in tests via https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/8657a6f607666112257fd7a63aed851e3af1eded, but at some point we should update those files.

jluethi commented 1 year ago

@tcompa An updated Zenodo dataset is now available here: https://zenodo.org/record/8091756

That should replace the old test dataset and we can get rid of the workarounds in the tests again :)

Do you also need the 2x2 datasets in the tests somewhere? I'm updating this atm, will upload in a bit

tcompa commented 1 year ago

@tcompa An updated Zenodo dataset is now available here: https://zenodo.org/record/8091756

Thanks!

That should replace the old test dataset and we can get rid of the workarounds in the tests again :)

Great, done with #454.

Do you also need the 2x2 datasets in the tests somewhere?

No. The ones we use are:

Then this is all good.

tcompa commented 1 year ago

Is there anything left to do on this issue, or should it all be deferred to #150?

jluethi commented 1 year ago

The rest is deferred to #150 , it seems to work decently now :)