fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
12 stars 5 forks source link

Review module/function structure for table-related helper functions #616

Closed tcompa closed 7 months ago

tcompa commented 7 months ago

As part of #613, we are actually moving towards complying with the table specs. This may come with a refactor of the core-library functions related to tables. Such refactor is not just a clean-up, but it is needed in view of a future coexistence of tools reading/writing V1 and V2 tables.

The general principle is that all functions which rely on V1 should be marked as such (e.g. in the module or function name), and we can offer general-use functions that do not make any assumption and then defer actual operations to versioned ones. An example is given below, for write_table.

write_table function

The first steps (already part of #613) concern the write_table function, which the closest to a nicely-scoped helper function:

  1. I moved it from lib_write.py to lib_tables.py, since lib_write.py is currently the place for some low-level zarr/anndata operations, mostly extending the original functions with overwrite-related options.
  2. I transformed write_table into a wrapper, which checks the table-specs version and defers the actual operation to _write_table_v1 (the only currently-supported one). In the future, this may conditionally call _write_table_v1 or _write_table_v2, provided they have a compatible I/O interface.
  3. I implemented a (soft) check of compliance with V1 specs, upon writing, as part of _write_table_v1. This is nicely contained in a function marked as v1, so that it won't affect future versions.

Other functions

The API for many other table-related functions is not so precisely defined as for write_table, and the goal of the current issue (and of #613) is not to provide such a set of definitions. Still, we should aim at defining at least the correct modules.

An example of a wrongly-placed function is lib_regions_of_interest.load_region, with call signature

def load_region(
    data_zyx: da.Array,
    region: tuple[slice, slice, slice],
    compute: bool = True,
    return_as_3D: bool = False,
) -> Union[da.Array, np.ndarray]:
    """
    Load a region from a dask array.

    Can handle both 2D and 3D dask arrays as input and return them as is or
    always as a 3D array.
    ...
    """

This function clearly has nothing to do with V1 or V2 or our table specs, and it would rather belong to a module like lib_arrays.py or something like that.

A closer look at lib_regions_of_interest.py

The lib_regions_of_interest.py module is a container of several different functions, and I think we could improve its structure. Here is a first classification of its functions:

Functions that fully rely on table-specs V1

Functions that are not specific to V1:

Note that the V1-related functions are not designed to have a version-agnostic I/O interface, and then I don't think it's useful to pursue the same route as for write_table - for the moment. But I would at least re-organize the modules, since the 15 functions defined in lib_regions_of_interest.py cover too broad a scope.

Specific proposals (which I'll implement in #613, while waiting for a broader discussion):

  1. Move the three non-V1-specific functions out from lib_regions_of_interest.py (is_standard_roi_table should go to lib_tables.py, while I think load_region and convert_indices_to_regions should be part of a new module)
  2. Keep the remaining 12 functions in lib_regions_of_interest.py, without adding a separation between table-creation and table-processing.
  3. Move lib_regions_of_interest.py to lib_regions_of_interest/v1.py, and re-export all functions to lib_regions_of_interest/__init__.py, so that the imports in the tasks remain the same. Note that I also think we should remove lib_, but this is a different issue.

Note about table-specs-versions coexistence

The reason for point 3 has to do with the migration to V2, which will eventually take place. For instance we can say that starting from version 1.1 we default to writing V2 tables from all tasks. We then add a new lib_regions_of_interest/v2.py, and we modify lib_regions_of_interest/__init__.py so that it exports the v2.py functions. The v1.py functions are still importable, but this will require e.g. from lib_regions_of_interest.v1 import prepare_FOV_ROI_table.

This does not yet mean that tasks will work on both V1 and V2 tables, but it's the first step in this direction. The next step will be (as already done for write_table) to implement a conditional check that infers the version of a given table and proceed accordingly by calling the V1 or V2 functions.

tcompa commented 7 months ago

Side comment: also lib_ROI_overlaps has a similar situation, since it is clearly related to specs V1. The difference here is that it can be clearly split into two parts:

  1. Overlap-detection functions, which are fully unrelated to fractal
  2. Wrappers which apply those functions to fractal ROI tables, and which do depend on V1

A possible structure would then be

$ tree  -L 1 lib_regions_of_interest/
lib_regions_of_interest/
├── __init__.py
├── _overlap_detection.py
├── v1_checks.py
├── v1_overlap_detection.py
└── v1.py

0 directories, 5 files

with the overlap-detection function now importable under the same namespace as the other ROI-related functions, as in

# CURRENT VERSION
from lib_regions_of_interest import prepare_FOV_ROI_table
from lib_ROI_overlaps import find_overlaps_in_ROI_indices

# FUTURE VERSION
from lib_regions_of_interest import prepare_FOV_ROI_table
from lib_regions_of_interest import find_overlaps_in_ROI_indices
tcompa commented 7 months ago

Summary of of changed imports:

-from fractal_tasks_core.lib_ROI_overlaps import find_overlaps_in_ROI_indices
+from fractal_tasks_core.lib_regions_of_interest import find_overlaps_in_ROI_indices

-from fractal_tasks_core.lib_ROI_overlaps import get_overlapping_pairs_3D
+from fractal_tasks_core.lib_regions_of_interest import get_overlapping_pairs_3D

-from fractal_tasks_core.lib_write import write_table
+from fractal_tasks_core.lib_tables import write_table