fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
14 stars 6 forks source link

Make task-argument type hints compatible with JSON Schemas #399

Closed tcompa closed 1 year ago

tcompa commented 1 year ago

EDIT: As described in the comments below, there are at least three cases to cover/fix/check:

  1. Forbid arg: Optional[str] = "something"
  2. Do not use Union in type hints
  3. Do not user Any in type hints

I think that a task function with signature like

def function(x: Optional[str] = "something"):
    pass

should not be present in our tasks, and that Optional arguments should default to None. This is not based on some solid knowledge, but I'm rather thinking that it may result into unexpected behavior depending on how we end up structuring the fractal-web task-arguments interface.

This is a placeholder issue, so that once we have a more stable fractal-web feature by @rkpasia we can stress-test it with one such example (as part of the "required vs optional arguments" review). If it's well supported, nothing to add. If it's not, then we should make sure (through a test) that we never include it here in this repository.

For the record, this issue happens exactly 0 times in the current code base, so that this is not a big deal.

$ git grep Optional | grep -v typing

cellpose_segmentation.py:    anisotropy: Optional[float] = None,
cellpose_segmentation.py:    label_dtype: Optional[np.dtype] = None,
cellpose_segmentation.py:    wavelength_id: Optional[str] = None,
cellpose_segmentation.py:    channel_label: Optional[str] = None,
cellpose_segmentation.py:    wavelength_id_c2: Optional[str] = None,
cellpose_segmentation.py:    channel_label_c2: Optional[str] = None,
cellpose_segmentation.py:    output_ROI_table: Optional[str] = None,
cellpose_segmentation.py:    output_label_name: Optional[str] = None,  # "organoids"
cellpose_segmentation.py:    anisotropy: Optional[float] = None,
cellpose_segmentation.py:    pretrained_model: Optional[str] = None,
copy_ome_zarr.py:    ROI_table_names: Optional[Sequence[str]] = None,
create_ome_zarr.py:    image_glob_patterns: Optional[list[str]] = None,
create_ome_zarr_multiplex.py:    image_glob_patterns: Optional[list[str]] = None,
illumination_correction.py:    new_component: Optional[str] = None,
tcompa commented 1 year ago

There's actually a more compelling reason to avoid the Optional[str] = "something".

This script

from typing import Optional
from pydantic.decorator import validate_arguments
import json

def fun(
        x: str,
        y: str = "default",
        w: Optional[str] = "default",
        z: Optional[str] = None,
        ):
    pass

print(json.dumps(validate_arguments(fun).model.schema(), indent=2, sort_keys=True))

outputs (with pydantic 1.10.4)

{
  "additionalProperties": false,
  "properties": {
    "args": {
      "items": {},
      "title": "Args",
      "type": "array"
    },
    "kwargs": {
      "title": "Kwargs",
      "type": "object"
    },
    "v__duplicate_kwargs": {
      "items": {
        "type": "string"
      },
      "title": "V  Duplicate Kwargs",
      "type": "array"
    },
    "w": {
      "default": "default",
      "title": "W",
      "type": "string"
    },
    "x": {
      "title": "X",
      "type": "string"
    },
    "y": {
      "default": "default",
      "title": "Y",
      "type": "string"
    },
    "z": {
      "title": "Z",
      "type": "string"
    }
  },
  "required": [
    "x"
  ],
  "title": "Fun",
  "type": "object"
}

where there is no difference between the y and w attributes.

tcompa commented 1 year ago

There are (at least) two other constraints that we need to enforce, to make the argument schemas suitable for fractal-web

  1. We should not use any Union type hint, because it would be quite complex to render the UI for setting them in fractal-web (in principle one could choose out of two types, but it seems very counterintuitive). For the moment we only have metadata_table: Union[Literal["mrf_mlf"], Dict[str, str]] = "mrf_mlf", which we should transform in a way that does not use Union. Some options:

    • (A) Optional[dict[str,str]] = None, and we use None with the same meaning as mrf_mlf
    • (B) Two mutually exclusive arguments metadata_source: Literal["mrf_mlf", "custom_table"] = "mrf_mlf" and metadata_table: Dict[str,str]
    • (C) ...
  2. We should not use any Any type, because that'd be simply impossible to render in the fractal-web UI. Currently we have it for:

The three points described in this issue (two here and one in the first comment) should also be automatically checked in the CI. I re-labeled this issue from "maintenance" to "High Priority", because it's strictly needed to use schemas in fractal-web.

jluethi commented 1 year ago

Makes sense and great if we're also checking for this in the CI.

I think the fractal-tasks-core package can be strict here to make for good usability. On the fractal web side, we should think on whether some of those scenarios can then fall back to the more manual mode arguments, i.e. letting the user specify the argument type as in the currently deployed version (to allow some flexibility during task development). For example: A type hint is Any => the user needs to set the type. A type hint was Union => the user needs to set the type. A type hint was anything else we can't make sense of => the user needs to set the type.

For metadata_table: Option A is good. The Yokogawa doesn't have other metadata types that can be parsed directly. In 99.9% of cases, the default has always been used so far. Thus, the None behavior can correspond to the "mrf_mlf" setting so far. But it can be overwritten with an actual metadata dictionary with specific tables paths in some edge cases.

tcompa commented 1 year ago

On the fractal web side, we should think on whether some of those scenarios can then fall back to the more manual mode arguments, i.e. letting the user specify the argument type as in the currently deployed version (to allow some flexibility during task development). For example: A type hint is Any => the user needs to set the type. A type hint was Union => the user needs to set the type. A type hint was anything else we can't make sense of => the user needs to set the type.

That would be nice, but providing such flexibility while still having schemas seems to be quite a long-term goal.

We are talking about a user who wants to develop their own tasks via fractal, and went all the way to produce a JSON Schema, but who also want the schema to be extremely flexible. This looks quite different from the standard use of a schema (which tends to be quite strict, so that it can be rendered in a clear way).

For instance I'd say that they can use two arguments instead of a Union, since this is really the task development phase. Or, if they really want full flexibility, they can even start with the legacy mode of setting arguments one by one.

This is just to say that it's clearly a nice feature, but I would see it as a very low-priority one.

tcompa commented 1 year ago

Current status:

tcompa commented 1 year ago

As part of #434 (with @ychiucco) we now have some basic checks in-place for Optional and Union.

Check that there are no Any in type hints - through a test

We have no simple non-super-hackish way of doing it, so we are not adding this check.