fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Refactor `prepare_label_group`? #634

Open tcompa opened 6 months ago

tcompa commented 6 months ago
          > Is there a reasonable behavior we'd do if a user doesn't provide zattrs? We couldn't know things like pixel sizes etc.

This is the current function signature:

def prepare_label_group(
    image_group: zarr.hierarchy.Group,
    label_name: str,
    label_attrs: dict[str, Any],
    overwrite: bool = False,
    logger: Optional[logging.Logger] = None,
) -> zarr.group:

Given these arguments, there's no attribute that we can directly infer. This is not about things like pixel sizes, but really about any attribute.


Things change if we make a further assumption, namely that we have access to the image being labeled (either by assuming we can go up two levels in the Zarr hierarchy and find it, or by having an additional function argument). In that case, we could get access to the image attributes; if we then also had additional information (mainly the target pyramid level), we could re-build the full attributes of the label image, as is currently done e.g. in the cellpose task (or equivalently in the napari-workflows task):

    new_datasets = rescale_datasets(
        datasets=[ds.dict() for ds in ngff_image_meta.datasets],
        coarsening_xy=coarsening_xy,
        reference_level=level,
        remove_channel_axis=True,
    )

    label_attrs = {
        "image-label": {
            "version": __OME_NGFF_VERSION__,
            "source": {"image": "../../"},
        },
        "multiscales": [
            {
                "name": output_label_name,
                "version": __OME_NGFF_VERSION__,
                "axes": [
                    ax.dict()
                    for ax in ngff_image_meta.multiscale.axes
                    if ax.type != "channel"
                ],
                "datasets": new_datasets,
            }
        ],
    }

This option for sure looks interesting, in view of factoring out a functionality which is already repeated in two tasks (note however that this is not how things happen e.g. in https://github.com/fmi-basel/gliberal-scMultipleX/blob/nar-fractal/src/scmultiplex/fractal/relabel_by_linking_consensus.py#L157-L174).

If we take this latter option, then the prepare_label_group function scope becomes broader, as it will handle both its current feature (mostly overwrite-related checks) and the feature of preparing Zarr attributes for the new group. We could then decide whether this is internally split in two functions, or a single one takes care of all of it (a renaming of the function could then be appropriate).

Originally posted by @tcompa in https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/619#issuecomment-1852137074