fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
14 stars 6 forks source link

Update `ROI_table_names` argument of `copy_ome_zarr` #448

Open tcompa opened 1 year ago

tcompa commented 1 year ago

Ref:

Python

In principle we'd like to support multiple use cases:

  1. ROI_table_names = ["something", "else"] -> only these ROIs are copied
  2. ROI_table_names = None -> some default ROIs are copied (i.e. "FOV_ROI_table", "well_ROI_table"])
  3. ROI_table_names = [] -> no ROIs are copied

Support for all three cases is already in-place in the Python function, but it's not fractal-web-friendly.

JSON Schema and fractal-web

The relevant bit of the JSON Schema is

{
        "title": "CopyOmeZarr",
        "type": "object",
        "properties": {
          ...
          "ROI_table_names": {
            "title": "Roi Table Names",
            "type": "array",
            "items": {
              "type": "string"
            },
            "description": ...
          }
        },
        "required": [ "input_paths", "output_path", "metadata" ],
        "additionalProperties": false
      }

This means that the following args are identified as valid by a JSON Schema validator:

{ ..., "ROI_table_names": []}
{ ..., "ROI_table_names": ["something", "else"]}
{ ... }

However, we are currently stripping empty objects/arrays from server API calls in fractal-web (ref https://github.com/fractal-analytics-platform/fractal-web/issues/209). This is in principle an arbitrary choice, but we don't want to end up with a a case-by-case heuristics for what should be done - since the current JSON Schema does not provide any guidance here (all three cases above are valid).

Back to Python

The obvious solution would be to set

def copy_ome_zarr(
    ...,
    ROI_table_names : list[str] = ["FOV_ROI_table", "well_ROI_table"]
    ):

but having mutable default values is a bad practice which we should avoid. (For the record: it wouldn't matter in our Fractal-embedded use of tasks, but it would be risky when someone uses tasks as Python functions, therefore we should simply avoid it).

This issue is to find an alternative solution, if any.

jluethi commented 1 year ago

So how would we provide default "lists" of arguments best? Normally, I'd say we provide tuples in such a case, though I'm not sure what the schema / fractal-web implication of introducing tuples would be.

tcompa commented 1 year ago

So how would we provide default "lists" of arguments best? Normally, I'd say we provide tuples in such a case, though I'm not sure what the schema / fractal-web implication of introducing tuples would be.

Using tuples looks like a good idea, but we need the tuple[str, ...] construct, which is not available for old python versions. I can check the actual minimal version required, but that's not really a problem since fractal-tasks-core requires python>=3.9.

For the record, PEP 484 (which appeared for Python 3.5) states:

Tuple, used by listing the element types, for example Tuple[int, int, str]. The empty tuple can be typed as Tuple[()]. Arbitrary-length homogeneous tuples can be expressed using one type and ellipsis, for example Tuple[int, ...]. (The ... here are part of the syntax, a literal ellipsis.)

tcompa commented 1 year ago

Let's see if we hit any unexpected edge case, but the JSON-Schema side of this solution is quite transparent (see the manifest diff in https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/449/commits/55ae926d02490b3d40467109eb58f5ce4db80a27):

          "ROI_table_names": {
            "title": "Roi Table Names",
+            "default": [
+              "FOV_ROI_table",
+              "well_ROI_table"
+            ],
            "type": "array",
            "items": {

Therefore this could provide a nice little workaround, which is simple to implement in Python and simple to document. Let's continue in #439.

tcompa commented 1 year ago

This was closed, but it needs to be re-opened. It is still impossible (in the current fractal-web) to set ROI_table_names=[], because of

If we remove one item from the two-items list, we get only one. If we remove both, we go back to the default value (i.e. both items). Maybe we will have to expose an option on fractal-web, to really set an empty list. Once again (as it is now evident): this is not trivial, due to the interaction of tasks/server/web. To be discussed further.

jluethi commented 1 year ago

Ok, but that's now a different issue to me. The user can see and change what ROI tables are copied. Due to the issue with [], it can't be validated as a correct input though if an empty list is provided.

This doesn't need to be fixed for 0.10.0, let's think about how we want to eventually handle this though.

tcompa commented 1 year ago

Ok, but that's now a different issue to me. The user can see and change what ROI tables are copied. Due to the issue with [], it can't be validated as a correct input though if an empty list is provided.

I agree: viewing the ROI-table names is better than relying on the side effects of an empty list (which is then transformed to null, removed, and finally set to a default value from within the task). It's clearly an improvement!

Still: Fractal as a whole (tasks+server+web) doesn't support passing an empty-list argument, but for this task it could be useful (if I want to play with a minimal OME-Zarr, which has no ROI tables). We can improve the task, and maybe only raise warnings if the required ROI tables do not exist, but the main issue remains.

This doesn't need to be fixed for 0.10.0

That would be too late anyway ;)

tcompa commented 1 year ago

[], it can't be validated as a correct input though if an empty list is provided.

Note that [] is in fact a valid input, and ajv would not complain about it. We only strip it away because we want to enforce some global behavior, but in this case there is no real reason to do it.