fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Expand cellpose options #650

Closed jluethi closed 4 months ago

jluethi commented 5 months ago

Closes #649 Closes #401

Checklist before merging

github-actions[bot] commented 5 months ago

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  fractal_tasks_core/tasks
  cellpose_segmentation.py 145-146
  cellpose_transforms.py 181-183, 197-199, 206-210
Project Total  

This report was generated by python-coverage-comment-action

jluethi commented 5 months ago

I tested this new version of the task on the UZH servers. But the Enum part of Fractal web isn't deployed there yet. And it doesn't solve the actual issue we had with sparse images (see comments in #649 ) => No immediate hurry on this, I'll go over it again in the coming days hopefully

jluethi commented 5 months ago

I added a more complex set of normalization options now.

Remaining todos:

jluethi commented 5 months ago

Testing in local deployment: The task works in a local Fractal server. But the normalize entry is not shown in the interface. It's supposed to come between Flow-Threshold & Anisotropy. Instead, I get a double line there now

Screenshot 2024-02-07 at 09 45 22

And seeing some validator errors in the console:

Screenshot 2024-02-07 at 09 46 35
jluethi commented 5 months ago

Maybe it's an issue with having Optional floats with default value None?

jluethi commented 5 months ago

I updated the manifest creation scripts to include my models, but things still show up weirdly in the interface.

I now looked at the manifest to try and see if the issue could be there.

Other external pydantic models are referenced like this:

          "channel": {
            "$ref": "#/definitions/ChannelInputModel",
            "title": "Channel",
            "description": "Primary channel for segmentation; requires either `wavelength_id` (e.g. `A01_C01`) or `label` (e.g. `DAPI`)."
          },

But my new entry has a setup like that:

          "normalize": {
            "title": "Normalize",
            "default": {
              "default_normalize": true,
              "lower_percentile": null,
              "upper_percentile": null,
              "lower_bound": null,
              "upper_bound": null
            },
            "allOf": [
              {
                "$ref": "#/definitions/CellposeCustomNormalizer"
              }
            ],
            "description": "By default, data is normalized so 0.0=1st percentile and 1.0=99th percentile of image intensities in each channel. This automatic normalization can lead to issues when the image to be segmented is very sparse. You can turn off the automated rescaling and either provide your own rescaling percentiles or fixed rescaling upper and lower bound integers"
          },

I can see this making sense. I don't fully understand the "allOf" references, but maybe that's how this is supposed to be specified?

But I still get the following validator error in the console:

Screenshot 2024-02-07 at 17 44 38

@tcompa Do you have any idea how to debug this to figure out why it fails?

tcompa commented 5 months ago

A first sanity check:

This is the current schema:

{
  "title": "CellposeSegmentation",
  "type": "object",
  "properties": {
    "input_paths": {
      "title": "Input Paths",
      "type": "array",
      "items": {
        "type": "string"
      },
      "description": "List of input paths where the image data is stored as OME-Zarrs. Should point to the parent folder containing one or many OME-Zarr files, not the actual OME-Zarr file. Example: `[\"/some/path/\"]`. This task only supports a single input path. (standard argument for Fractal tasks, managed by Fractal server)."
    },
    "output_path": {
      "title": "Output Path",
      "type": "string",
      "description": "This parameter is not used by this task. (standard argument for Fractal tasks, managed by Fractal server)."
    },
    "component": {
      "title": "Component",
      "type": "string",
      "description": "Path to the OME-Zarr image in the OME-Zarr plate that is processed. Example: `\"some_plate.zarr/B/03/0\"`. (standard argument for Fractal tasks, managed by Fractal server)."
    },
    "metadata": {
      "title": "Metadata",
      "type": "object",
      "description": "This parameter is not used by this task. (standard argument for Fractal tasks, managed by Fractal server)."
    },
    "level": {
      "title": "Level",
      "type": "integer",
      "description": "Pyramid level of the image to be segmented. Choose `0` to process at full resolution."
    },
    "channel": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel",
      "description": "Primary channel for segmentation; requires either `wavelength_id` (e.g. `A01_C01`) or `label` (e.g. `DAPI`)."
    },
    "channel2": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel2",
      "description": "Second channel for segmentation (in the same format as `channel`). If specified, cellpose runs in dual channel mode. For dual channel segmentation of cells, the first channel should contain the membrane marker, the second channel should contain the nuclear marker."
    },
    "input_ROI_table": {
      "title": "Input Roi Table",
      "default": "FOV_ROI_table",
      "type": "string",
      "description": "Name of the ROI table over which the task loops to apply Cellpose segmentation. Examples: `FOV_ROI_table` => loop over the field of views, `organoid_ROI_table` => loop over the organoid ROI table (generated by another task), `well_ROI_table` => process the whole well as one image."
    },
    "output_ROI_table": {
      "title": "Output Roi Table",
      "type": "string",
      "description": "If provided, a ROI table with that name is created, which will contain the bounding boxes of the newly segmented labels. ROI tables should have `ROI` in their name."
    },
    "output_label_name": {
      "title": "Output Label Name",
      "type": "string",
      "description": "Name of the output label image (e.g. `\"organoids\"`)."
    },
    "use_masks": {
      "title": "Use Masks",
      "default": true,
      "type": "boolean",
      "description": "If `True`, try to use masked loading and fall back to `use_masks=False` if the ROI table is not suitable. Masked loading is relevant when only a subset of the bounding box should actually be processed (e.g. running within `organoid_ROI_table`)."
    },
    "relabeling": {
      "title": "Relabeling",
      "default": true,
      "type": "boolean",
      "description": "If `True`, apply relabeling so that label values are unique for all objects in the well."
    },
    "diameter_level0": {
      "title": "Diameter Level0",
      "default": 30.0,
      "type": "number",
      "description": "Expected diameter of the objects that should be segmented in pixels at level 0. Initial diameter is rescaled using the `level` that was selected. The rescaled value is passed as the diameter to the `CellposeModel.eval` method."
    },
    "model_type": {
      "title": "Model Type",
      "default": "cyto2",
      "enum": [
        "cyto",
        "nuclei",
        "tissuenet",
        "livecell",
        "cyto2",
        "general",
        "CP",
        "CPx",
        "TN1",
        "TN2",
        "TN3",
        "LC1",
        "LC2",
        "LC3",
        "LC4"
      ],
      "type": "string",
      "description": "Parameter of `CellposeModel` class. Defines which model should be used. Typical choices are `nuclei`, `cyto`, `cyto2`, etc."
    },
    "pretrained_model": {
      "title": "Pretrained Model",
      "type": "string",
      "description": "Parameter of `CellposeModel` class (takes precedence over `model_type`). Allows you to specify the path of a custom trained cellpose model."
    },
    "cellprob_threshold": {
      "title": "Cellprob Threshold",
      "default": 0.0,
      "type": "number",
      "description": "Parameter of `CellposeModel.eval` method. Valid values between -6 to 6. From Cellpose documentation: \"Decrease this threshold if cellpose is not returning as many ROIs as you\u2019d expect. Similarly, increase this threshold if cellpose is returning too ROIs particularly from dim areas.\""
    },
    "flow_threshold": {
      "title": "Flow Threshold",
      "default": 0.4,
      "type": "number",
      "description": "Parameter of `CellposeModel.eval` method. Valid values between 0.0 and 1.0. From Cellpose documentation: \"Increase this threshold if cellpose is not returning as many ROIs as you\u2019d expect. Similarly, decrease this threshold if cellpose is returning too many ill-shaped ROIs.\""
    },
    "normalize": {
      "title": "Normalize",
      "default": {
        "default_normalize": true,
        "lower_percentile": null,
        "upper_percentile": null,
        "lower_bound": null,
        "upper_bound": null
      },
      "allOf": [
        {
          "$ref": "#/definitions/CellposeCustomNormalizer"
        }
      ],
      "description": "By default, data is normalized so 0.0=1st percentile and 1.0=99th percentile of image intensities in each channel. This automatic normalization can lead to issues when the image to be segmented is very sparse. You can turn off the automated rescaling and either provide your own rescaling percentiles or fixed rescaling upper and lower bound integers"
    },
    "anisotropy": {
      "title": "Anisotropy",
      "type": "number",
      "description": "Ratio of the pixel sizes along Z and XY axis (ignored if the image is not three-dimensional). If `None`, it is inferred from the OME-NGFF metadata."
    },
    "min_size": {
      "title": "Min Size",
      "default": 15,
      "type": "integer",
      "description": "Parameter of `CellposeModel` class. Minimum size of the segmented objects (in pixels). Use `-1` to turn off the size filter."
    },
    "augment": {
      "title": "Augment",
      "default": false,
      "type": "boolean",
      "description": "Parameter of `CellposeModel` class. Whether to use cellpose augmentation to tile images with overlap."
    },
    "net_avg": {
      "title": "Net Avg",
      "default": false,
      "type": "boolean",
      "description": "Parameter of `CellposeModel` class. Whether to use cellpose net averaging to run the 4 built-in networks (useful for `nuclei`, `cyto` and `cyto2`, not sure it works for the others)."
    },
    "use_gpu": {
      "title": "Use Gpu",
      "default": true,
      "type": "boolean",
      "description": "If `False`, always use the CPU; if `True`, use the GPU if possible (as defined in `cellpose.core.use_gpu()`) and fall-back to the CPU otherwise."
    },
    "batch_size": {
      "title": "Batch Size",
      "default": 8,
      "type": "integer",
      "description": "number of 224x224 patches to run simultaneously on the GPU (can make smaller or bigger depending on GPU memory usage)"
    },
    "invert": {
      "title": "Invert",
      "default": false,
      "type": "boolean",
      "description": "invert image pixel intensity before running network (if True, image is also normalized)"
    },
    "tile": {
      "title": "Tile",
      "default": true,
      "type": "boolean",
      "description": "tiles image to ensure GPU/CPU memory usage limited (recommended)"
    },
    "tile_overlap": {
      "title": "Tile Overlap",
      "default": 0.1,
      "type": "number",
      "description": "fraction of overlap of tiles when computing flows"
    },
    "resample": {
      "title": "Resample",
      "default": true,
      "type": "boolean",
      "description": "run dynamics at original image size (will be slower but create more accurate boundaries)"
    },
    "interp": {
      "title": "Interp",
      "default": true,
      "type": "boolean",
      "description": "interpolate during 2D dynamics (not available in 3D) (in previous versions it was False, now it defaults to True)"
    },
    "stitch_threshold": {
      "title": "Stitch Threshold",
      "default": 0.0,
      "type": "number",
      "description": "if stitch_threshold>0.0 and not do_3D and equal image sizes, masks are stitched in 3D to return volume segmentation"
    },
    "overwrite": {
      "title": "Overwrite",
      "default": true,
      "type": "boolean",
      "description": "If `True`, overwrite the task output."
    }
  },
  "required": [
    "input_paths",
    "output_path",
    "component",
    "metadata",
    "level",
    "channel"
  ],
  "additionalProperties": false,
  "definitions": {
    "ChannelInputModel": {
      "title": "ChannelInputModel",
      "description": "A channel which is specified by either `wavelength_id` or `label`.",
      "type": "object",
      "properties": {
        "wavelength_id": {
          "title": "Wavelength Id",
          "type": "string",
          "description": "Unique ID for the channel wavelength, e.g. `A01_C01`."
        },
        "label": {
          "title": "Label",
          "type": "string",
          "description": "Name of the channel."
        }
      }
    },
    "CellposeCustomNormalizer": {
      "title": "CellposeCustomNormalizer",
      "description": "Validator to handle different normalization scenarios for Cellpose models",
      "type": "object",
      "properties": {
        "default_normalize": {
          "title": "Default Normalize",
          "default": true,
          "type": "boolean",
          "description": "Whether to use the default Cellpose normalization approach (rescaling the image between the 1st and 99th percentile)"
        },
        "lower_percentile": {
          "title": "Lower Percentile",
          "type": "number",
          "description": "Specify a custom lower-bound percentile for rescaling as a float value between 0 and 100. Set to 1 to run the same as default). You can only specify percentils or bounds, not both."
        },
        "upper_percentile": {
          "title": "Upper Percentile",
          "type": "number",
          "description": "Specify a custom upper-bound percentile for rescaling as a float value between 0 and 100. Set to 99 to run the same as default, set to e.g. 99.99 if the default rescaling was too harsh. You can only specify percentils or bounds, not both."
        },
        "lower_bound": {
          "title": "Lower Bound",
          "type": "integer",
          "description": "Explicit lower bound value to rescale the image at. Needs to be an integer, e.g. 100. You can only specify percentils or bounds, not both."
        },
        "upper_bound": {
          "title": "Upper Bound",
          "type": "integer",
          "description": "Explicit upper bound value to rescale the image at. Needs to be an integer, e.g. 2000 You can only specify percentils or bounds, not both."
        }
      }
    }
  }
}

This is how it's rendered in fractal-web image

This is how it's rendered in https://rjsf-team.github.io/react-jsonschema-form/: image


Let's note again the difference between two entries:


    "channel2": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel2",
      "description": "Second channel for segmentation (in the same format as `channel`). If specified, cellpose runs in dual channel mode. For dual channel segmentation of cells, the first channel should contain the membrane marker, the second channel should contain the nuclear marker."
    },

    "normalize": {
      "title": "Normalize",
      "default": {
        "default_normalize": true,
        "lower_percentile": null,
        "upper_percentile": null,
        "lower_bound": null,
        "upper_bound": null
      },
      "allOf": [
        {
          "$ref": "#/definitions/CellposeCustomNormalizer"
        }
      ],
      "description": "By default, data is normalized so 0.0=1st percentile and 1.0=99th percentile of image intensities in each channel. This automatic normalization can lead to issues when the image to be segmented is very sparse. You can turn off the automated rescaling and either provide your own rescaling percentiles or fixed rescaling upper and lower bound integers"
    },

coming from

def cellpose_segmentation(
    *,
    ...
    channel2: Optional[ChannelInputModel] = None,
    ...
    normalize: CellposeCustomNormalizer = CellposeCustomNormalizer(
        default_normalize=True
    ),
...

My first guess is that creating an instance of the normalizer and using it as default for the argument cannot be transformed into a nice default within the JSON schema. In general, it'd be better to say normalize: Optional[CellposeCustomNormalizer] = None, and then creating a default instances from within the task. I haven't tested this yet, I'll dig deeper later.

tcompa commented 5 months ago

As of https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/650/commits/a9185ba9cbda4529260408015eb7da30d55890b8, this is the new rendering: image

jluethi commented 4 months ago

My first guess is that creating an instance of the normalizer and using it as default for the argument cannot be transformed into a nice default within the JSON schema. In general, it'd be better to say normalize: Optional[CellposeCustomNormalizer] = None, and then creating a default instances from within the task. I haven't tested this yet, I'll dig deeper later.

Ah, that makes sense! And it gets a default object (with default_normalize = True) in your screenshot above, so that's the behavior I was trying to ensure with setting a model. Thanks for that fix!

I'll test a few more things with it now! :)

jluethi commented 4 months ago

Ok, it works as expected for me now in the interface as well! :) Thanks @tcompa !

On the remaining list:

jluethi commented 4 months ago

I now added the tests for the new functionality. They can always be improved further of course, but should be a decent basis to start off from.

I have added the 0.14.2a2 version to the Pelkmans lab cluster and will run some tests on real-life data with some edge cases next.

tcompa commented 4 months ago

Please don't merge this PR yet, because there is still a glitch on the JSON-schema side. You can already test the current version, as we'll just have to provide a better way of reaching the same functionality.

jluethi commented 4 months ago

Oh, what's the glitch?

tcompa commented 4 months ago

The following JSON schema (a simplified version of the current one)

{
  "title": "CellposeSegmentation",
  "type": "object",
  "properties": {
    "channel": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel"
    },
    "channel2": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel2"
    },
    "normalize": {
      "$ref": "#/definitions/CellposeCustomNormalizer",
      "title": "Normalize"
    }
  },
  "required": [
    "channel"
  ],
  "additionalProperties": false,
  "definitions": {
    "ChannelInputModel": {
      "title": "ChannelInputModel",
      "type": "object",
      "properties": {
        "wavelength_id": {
          "title": "Wavelength Id",
          "type": "string"
        },
        "label": {
          "title": "Label",
          "type": "string"
        }
      }
    },
    "CellposeCustomNormalizer": {
      "title": "CellposeCustomNormalizer",
      "type": "object",
      "properties": {
        "default_normalize": {
          "title": "Default Normalize",
          "default": true,
          "type": "boolean"
        },
        "lower_percentile": {
          "title": "Lower Percentile",
          "type": "number"
        }
      }
    }
  }
}

is rendered as follows image

Note the difference:

The consequence of this is that when using fractal-web, we can never run the task with normalize=None. In other words, we never enter this if branch

    # Set default normalizer
    if normalize is None:
        normalize = CellposeCustomNormalizer(default_normalize=True)

which is problematic for at least two reasons:

  1. It's highly counterintuitive, since the call signature says normalize: Optional[CellposeCustomNormalizer] = None
  2. It may be plain wrong, in case we slightly modify the content of that if branch, e.g. to change some default.

One solution is readily available, though. As far as we understand (debugged with @mfranzon), the CellposeCustomNormalizer.default_normalize attribute is redundant. If any of the other four values is set, this is automatically false. But there is no use case where we may want to simultaneously set CellposeCustomNormalizer.default_normalize and e.g. CellposeCustomNormalizer.lower_bound (and that's why the validator has to check this condition, and raise an error if appropriate).

This also suggests a solution: default_normalize should not be an attribute, but rather a property. Something like:

class CellposeCustomNormalizer(BaseModel):
    lower_percentile: Optional[float] = Field(None, ge=0, le=100)
    upper_percentile: Optional[float] = Field(None, ge=0, le=100)
    lower_bound: Optional[int] = None
    upper_bound: Optional[int] = None

    @property
    def default_normalize(self) -> bool:
        if self.lower_percentile is not None:
            return False
        elif self.lower_bound is not None:
            return False
        elif self.upper_percentile is not None:
            return False
        elif self.upper_bound is not None:
            return False
        else:
            return True

I think this will work, but I've not yet tested it.


Higher-level comments:

  1. Once again, we realize that mapping Python call signatures to JSON schemas has plenty of interesting edge cases;
  2. If we want to keep the JSON Schemas "clean" (for any definition of "clean"), we can push a lot of the "custom" logic to places which are not then converted in JSON schemas, e.g. validators and properties.
jluethi commented 4 months ago

The consequence of this is that when using fractal-web, we can never run the task with normalize=None

I can see how this is a general problem. For this task, that's actually fine.

As far as we understand (debugged with @mfranzon), the CellposeCustomNormalizer.default_normalize attribute is redundant. If any of the other four values is set, this is automatically false. But there is no use case where we may want to simultaneously set CellposeCustomNormalizer.default_normalize and e.g. CellposeCustomNormalizer.lower_bound (and that's why the validator has to check this condition, and raise an error if appropriate).

Unfortunately, there is one case: One can set CellposeCustomNormalizer.default_normalize=False and set all the other elements to None. That is supported and has a specific behavior in Cellpose (though not a very typically useful one in my tests)

tcompa commented 4 months ago

As of our call:

There are mainly four branches to be covered:

  1. Do not apply any normalization at all;
  2. Apply the default CP normalization;
  3. Apply a user-provided bound-based normalization;
  4. Apply a user-provided percentiles-based normalization.

And the expected default behavior, when there is no user input, is case number 2 (apply default CP normalization).

tcompa commented 4 months ago

Here is a naive, untested, example:

from typing import Optional, Literal
from pydantic import BaseModel, Field

class CellposeCustomNormalizer(BaseModel):
    type: Optional[Literal["custom", "no_normalization"]] = None
    lower_percentile: Optional[float] = Field(None, ge=0, le=100)
    upper_percentile: Optional[float] = Field(None, ge=0, le=100)
    lower_bound: Optional[int] = None
    upper_bound: Optional[int] = None

    @property
    def apply_normalization(self) -> bool:
        # returns False when we do not want to apply any normalization
        # returns True when _some_ normalization has to be applied (CP-default, bounds, or percentiles)
        return self.type != "no_normalization"

    def validator_TBD(self):
        # This should enforce some relevant constraints, like:
        # if type=None, then no other attribute can be set
        # if type="no_normalization", then no other attributes can be set
        # if type="custom", then either the bounds or the percentiles attributes must be set, but not both
        pass

def task(
    ...
    normalize: Optional[CellposeCustomNormalizer] = None,
    ...):

In this version, we'd have

  1. Do not apply any normalization at all; -> type="no_normalization"
  2. Apply the default CP normalization; -> type=None (i.e. no input)
  3. Apply a user-provided bound-based normalization; -> type="custom"
  4. Apply a user-provided percentiles-based normalization. -> type="custom"

Something I'm not happy about is that I think that if you modify the default once then you may not be able to get back to the original situation. That is, I don't think that Optional[Literal] translates into an enum with three options (None, custom and no_normalization). I'll iterate on this later.

This preliminary example is in the direction of coming up with a model with no default attributes - which would then "fix" the JSON-schema glitch. The other option is to actually provide a default in the function call signature, and make sure that it's transformed into a JSON Schema that is easily understood from fractal-web. A third option is to extend fractal-web to handle this "new" kind of JSON schema.

jluethi commented 4 months ago

Very interesting thinking & the direction looks promising, thanks Tommaso!

The other thing to consider: We want to make sure users understand what they are running by default. e.g. "if I don't modify anything here, it will apply default Cellpose normalization"

tcompa commented 4 months ago

I have a better-defined proposal (based on the previous one) which consists in using the model and call signature below. I'll document in two other comments the Python behavior and the JSON-Schema behavior.

Call-signature (simplified) and task function

@validate_arguments
def cellpose_segmentation(
    *,
    # Fractal arguments
    channel: ChannelInputModel,
    channel2: Optional[ChannelInputModel] = None,
    normalize: Optional[CellposeCustomNormalizer] = None,
    normalizeV2: Optional[CellposeCustomNormalizerV2] = None,
) -> dict[str, Any]:

    if normalizeV2 is None:
        normalizeV2 = CellposeCustomNormalizerV2()

Model for argument


class CellposeCustomNormalizerV2(BaseModel):
    """
    Validator to handle different normalization scenarios for Cellpose models

    If `type` is unset or set to `default`, Cellpose default normalization is
    used and no other parameters can be specified.
    If `type` is set to `no_normalization`, no normalization is used and no
    other parameters can be specified.
    If `type` is set to `custom`, either percentiles or explicit integer
    bounds can be applied.

    Attributes:
        type:
            One of `default` (Cellpose default normalization), `custom`
            (using the other custom parameters) or `no_normalization`.
        lower_percentile: Specify a custom lower-bound percentile for rescaling
            as a float value between 0 and 100. Set to 1 to run the same as
            default). You can only specify percentiles or bounds, not both.
        upper_percentile: Specify a custom upper-bound percentile for rescaling
            as a float value between 0 and 100. Set to 99 to run the same as
            default, set to e.g. 99.99 if the default rescaling was too harsh.
            You can only specify percentiles or bounds, not both.
        lower_bound: Explicit lower bound value to rescale the image at.
            Needs to be an integer, e.g. 100.
            You can only specify percentiles or bounds, not both.
        upper_bound: Explicit upper bound value to rescale the image at.
            Needs to be an integer, e.g. 2000.
            You can only specify percentiles or bounds, not both.
    """
    type: Optional[Literal["default", "custom", "no_normalization"]] = None
    lower_percentile: Optional[float] = Field(None, ge=0, le=100)
    upper_percentile: Optional[float] = Field(None, ge=0, le=100)
    lower_bound: Optional[int] = None
    upper_bound: Optional[int] = None

    @root_validator
    def validate_conditions(cls, values):

        # Replace type=None with type="default"
        type = values.get("type")
        if type is None:
            type = "default"
            values["type"] = type

        # Extract values of custom parameters
        lower_percentile = values.get("lower_percentile")
        upper_percentile = values.get("upper_percentile")
        lower_bound = values.get("lower_bound")
        upper_bound = values.get("upper_bound")

        # Verify that custom parameters are only provided when type="custom"
        if type != "custom":
            if lower_percentile is not None:
                raise ValueError(
                    f"Type='{type}' but {lower_percentile=}. "
                    "Hint: set type='custom'."
                    )
            if upper_percentile is not None:
                raise ValueError(
                    f"Type='{type}' but {upper_percentile=}. "
                    "Hint: set type='custom'."
                    )
            if lower_bound is not None:
                raise ValueError(
                    f"Type='{type}' but {lower_bound=}. "
                    "Hint: set type='custom'."
                )
            if upper_bound is not None:
                raise ValueError(
                    f"Type='{type}' but {upper_bound=}. "
                    "Hint: set type='custom'."
                )

        # The only valid options are:
        # 1. Both percentiles are set and both bounds are unset
        # 2. Both bounds are set and both percentiles are unset
        are_percentiles_set = (
            lower_percentile is not None,
            upper_percentile is not None,
            )
        are_bounds_set = (
            lower_bound is not None,
            upper_bound is not None,
            )
        if len(set(are_percentiles_set)) != 1:
            raise ValueError(
                "Both lower_percentile and upper_percentile must be set "
                "together."
                )
        if len(set(are_bounds_set)) != 1:
            raise ValueError(
                "Both lower_bound and upper_bound must be set together"
                )
        if lower_percentile is not None and lower_bound is not None:
            raise ValueError(
                "You cannot set both explicit bounds and percentile bounds "
                "at the same time. Hint: use only one of the two options."
            )

        return values
tcompa commented 4 months ago

Pydantic-model behavior:

>>> from fractal_tasks_core.tasks.cellpose_transforms import CellposeCustomNormalizerV2

>>> CellposeCustomNormalizerV2()
CellposeCustomNormalizerV2(type='default', lower_percentile=None, upper_percentile=None, lower_bound=None, upper_bound=None)

>>> CellposeCustomNormalizerV2(type="default")
CellposeCustomNormalizerV2(type='default', lower_percentile=None, upper_percentile=None, lower_bound=None, upper_bound=None)

>>> CellposeCustomNormalizerV2(lower_bound=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Type='default' but lower_bound=1. Hint: set type='custom'. (type=value_error)

>>> CellposeCustomNormalizerV2(type="default", lower_bound=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Type='default' but lower_bound=1. Hint: set type='custom'. (type=value_error)

>>> CellposeCustomNormalizerV2(type="no_normalization")
CellposeCustomNormalizerV2(type='no_normalization', lower_percentile=None, upper_percentile=None, lower_bound=None, upper_bound=None)

>>> CellposeCustomNormalizerV2(type="no_normalization", lower_bound=1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Type='no_normalization' but lower_bound=1. Hint: set type='custom'. (type=value_error)

>>> CellposeCustomNormalizerV2(type="custom")
CellposeCustomNormalizerV2(type='custom', lower_percentile=None, upper_percentile=None, lower_bound=None, upper_bound=None)

>>> CellposeCustomNormalizerV2(type="custom", lower_percentile=2, upper_percentile=90)
CellposeCustomNormalizerV2(type='custom', lower_percentile=2.0, upper_percentile=90.0, lower_bound=None, upper_bound=None)

>>> CellposeCustomNormalizerV2(type="custom", lower_bound=1, upper_bound=10)
CellposeCustomNormalizerV2(type='custom', lower_percentile=None, upper_percentile=None, lower_bound=1, upper_bound=10)

>>> CellposeCustomNormalizerV2(type="custom", lower_percentile=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Both lower_percentile and upper_percentile must be set together. (type=value_error)

>>> CellposeCustomNormalizerV2(type="custom", lower_bound=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Both lower_bound and upper_bound must be set together (type=value_error)

>>> CellposeCustomNormalizerV2(type="custom", lower_percentile=2, upper_percentile=90, lower_bound=2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  Both lower_bound and upper_bound must be set together (type=value_error)

>>> CellposeCustomNormalizerV2(type="custom", lower_percentile=2, upper_percentile=90, lower_bound=2, upper_bound=123)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pydantic/main.py", line 341, in pydantic.main.BaseModel.__init__
pydantic.error_wrappers.ValidationError: 1 validation error for CellposeCustomNormalizerV2
__root__
  You cannot set both explicit bounds and percentile bounds at the same time. Hint: use only one of the two options. (type=value_error)
tcompa commented 4 months ago

JSON Schema (simplified call signature):

{
  "title": "CellposeSegmentation",
  "type": "object",
  "properties": {
    "channel": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel",
      "description": "Primary channel for segmentation; requires either `wavelength_id` (e.g. `A01_C01`) or `label` (e.g. `DAPI`)."
    },
    "channel2": {
      "$ref": "#/definitions/ChannelInputModel",
      "title": "Channel2",
      "description": "Second channel for segmentation (in the same format as `channel`). If specified, cellpose runs in dual channel mode. For dual channel segmentation of cells, the first channel should contain the membrane marker, the second channel should contain the nuclear marker."
    },
    "normalize": {
      "$ref": "#/definitions/CellposeCustomNormalizer",
      "title": "Normalize",
      "description": "By default, data is normalized so 0.0=1st percentile and 1.0=99th percentile of image intensities in each channel. This automatic normalization can lead to issues when the image to be segmented is very sparse. You can turn off the automated rescaling and either provide your own rescaling percentiles or fixed rescaling upper and lower bound integers"
    },
    "normalizeV2": {
      "$ref": "#/definitions/CellposeCustomNormalizerV2",
      "title": "Normalizev2",
      "description": "By default, data is normalized so 0.0=1st percentile and"
    }
  },
  "required": [
    "channel"
  ],
  "additionalProperties": false,
  "definitions": {
    "ChannelInputModel": {
      "title": "ChannelInputModel",
      "description": "A channel which is specified by either `wavelength_id` or `label`.",
      "type": "object",
      "properties": {
        "wavelength_id": {
          "title": "Wavelength Id",
          "type": "string",
          "description": "Unique ID for the channel wavelength, e.g. `A01_C01`."
        },
        "label": {
          "title": "Label",
          "type": "string",
          "description": "Name of the channel."
        }
      }
    },
    "CellposeCustomNormalizer": {
      "title": "CellposeCustomNormalizer",
      "description": "Validator to handle different normalization scenarios for Cellpose models",
      "type": "object",
      "properties": {
        "default_normalize": {
          "title": "Default Normalize",
          "default": true,
          "type": "boolean",
          "description": "Whether to use the default Cellpose normalization approach (rescaling the image between the 1st and 99th percentile)"
        },
        "lower_percentile": {
          "title": "Lower Percentile",
          "type": "number",
          "description": "Specify a custom lower-bound percentile for rescaling as a float value between 0 and 100. Set to 1 to run the same as default). You can only specify percentils or bounds, not both."
        },
        "upper_percentile": {
          "title": "Upper Percentile",
          "type": "number",
          "description": "Specify a custom upper-bound percentile for rescaling as a float value between 0 and 100. Set to 99 to run the same as default, set to e.g. 99.99 if the default rescaling was too harsh. You can only specify percentils or bounds, not both."
        },
        "lower_bound": {
          "title": "Lower Bound",
          "type": "integer",
          "description": "Explicit lower bound value to rescale the image at. Needs to be an integer, e.g. 100. You can only specify percentils or bounds, not both."
        },
        "upper_bound": {
          "title": "Upper Bound",
          "type": "integer",
          "description": "Explicit upper bound value to rescale the image at. Needs to be an integer, e.g. 2000 You can only specify percentils or bounds, not both."
        }
      }
    },
    "CellposeCustomNormalizerV2": {
      "title": "CellposeCustomNormalizerV2",
      "description": "Validator to handle different normalization scenarios for Cellpose models",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "enum": [
            "default",
            "custom",
            "no_normalization"
          ],
          "type": "string",
          "description": "One of `default` (Cellpose default normalization), `custom` (using the other custom parameters) or `no_normalization`."
        },
        "lower_percentile": {
          "title": "Lower Percentile",
          "minimum": 0,
          "maximum": 100,
          "type": "number",
          "description": "Specify a custom lower-bound percentile for rescaling as a float value between 0 and 100. Set to 1 to run the same as default). You can only specify percentiles or bounds, not both."
        },
        "upper_percentile": {
          "title": "Upper Percentile",
          "minimum": 0,
          "maximum": 100,
          "type": "number",
          "description": "Specify a custom upper-bound percentile for rescaling as a float value between 0 and 100. Set to 99 to run the same as default, set to e.g. 99.99 if the default rescaling was too harsh. You can only specify percentiles or bounds, not both."
        },
        "lower_bound": {
          "title": "Lower Bound",
          "type": "integer",
          "description": "Explicit lower bound value to rescale the image at. Needs to be an integer, e.g. 100. You can only specify percentiles or bounds, not both."
        },
        "upper_bound": {
          "title": "Upper Bound",
          "type": "integer",
          "description": "Explicit upper bound value to rescale the image at. Needs to be an integer, e.g. 2000. You can only specify percentiles or bounds, not both."
        }
      }
    }
  }
}

Behavior in fractal-web sandbox image

Behavior in react-jsonschema-form

image

image

image

tcompa commented 4 months ago

Something I'm not happy about is that I think that if you modify the default once then you may not be able to get back to the original situation. That is, I don't think that Optional[Literal] translates into an enum with three options (None, custom and no_normalization). I'll iterate on this later.

This is still present, but mitigated by the fact that now there is "default" entry in the enum, and then one could easily decide to use that one and provide a non-null value for normalize={"type": "default"}. In my view that's a good-enough trade-off.

tcompa commented 4 months ago

@jluethi I'll let you evaluate if this option fits with the expectation. If so, in principle you should just replace the current Pydantic model with the new one, and apply minor changes to the parts of the task where this object is used.

tcompa commented 4 months ago

Ref for the fact that Pydantic uses allOf:

jluethi commented 4 months ago

I like the logic of the new Enum with the 3 options. And the checks look good. => I like the new logic, thanks for revisiting it!

One detail that I'm not 100% happy with yet:

This is still present, but mitigated by the fact that now there is "default" entry in the enum, and then one could easily decide to use that one and provide a non-null value for normalize={"type": "default"}. In my view that's a good-enough trade-off.

The thing that confuses me somewhat: We now avoid setting defaults twice:

  1. The parameter of the Cellpose model doesn't have a default, but we set it in an if normalizeV2 is None: normalizeV2 = CellposeCustomNormalizerV2() check
  2. The type doesn't have a default, and we set it the same way (in such an if check).

Now this achieves the desired behavior of running default normalization if the user doesn't set anything. But it's even less transparent than the one before (where we also discussed that the first default = None => if check in the function) was suboptimal. If such a setup is necessary to do this without revisiting the whole pydantic schemas for this, I'm ok with that and find it an acceptable trade-off. But do we gain anything from

type: Optional[Literal["default", "custom", "no_normalization"]] = None

vs.

type: Literal["default", "custom", "no_normalization"] = "default"

?

Because it worked fine (as far as I could judge) with

default_normalize: bool = True

And that has the benefit that the user interface displays what default is being used in the web. Is there a reason that type couldn't be a literal with a default (instead of an optional literal)?

jluethi commented 4 months ago

Also, neat new validation approaches!

tcompa commented 4 months ago

Briefly

And that has the benefit that the user interface displays what default is being used in the web. Is there a reason that type couldn't be a literal with a default (instead of an optional literal)?

I fully agree, the best (-> most transparent) option is

type: Literal["default", "custom", "no_normalization"] = "default"

My previous understanding, however, was that this immediately leads to the anyOf issue as in your first version (ref https://github.com/pydantic/pydantic/issues/2592), and then it was easier to avoid it. I'm now trying to verify this assumption, and it turns out it's likely maybe wrong. More on this shortly. If it's true that we hit no anyOf issue, then I fully agree with switching to the more explicit version.

tcompa commented 4 months ago

Provisional information: the allOf issue only shows up in the presence of the default:

from pydantic.decorator import ValidatedFunction
from typing import Literal
from devtools import debug
from fractal_tasks_core.dev.lib_args_schemas import _validate_function_signature
from fractal_tasks_core.dev.lib_args_schemas import _remove_args_kwargs_properties
from fractal_tasks_core.dev.lib_args_schemas import _remove_pydantic_internals
from typing import Literal, Optional

from pydantic import BaseModel

class Normalizer(BaseModel):
    type: Literal["A", "B"] = "A"

def task_function_1(normalizer: Normalizer):
    pass

def task_function_2(normalizer: Optional[Normalizer] = None):
    pass

def task_function_3(normalizer: Normalizer = Normalizer()):
    pass

# GENERATE JSON SCHEMA
for task_function in [task_function_1, task_function_2, task_function_3]:
    _validate_function_signature(task_function)
    vf = ValidatedFunction(task_function, config=None)
    schema = vf.model.schema()
    schema = _remove_args_kwargs_properties(schema)
    schema = _remove_pydantic_internals(schema)
    debug(schema)

Output:

    schema: {
        'title': 'TaskFunction1',
        'type': 'object',
        'properties': {
            'normalizer': {
                '$ref': '#/definitions/Normalizer',    # <--------- this is OK
            },
        },
        'required': [
            'normalizer',
        ],
        'additionalProperties': False,
        'definitions': {
            'Normalizer': {
                'title': 'Normalizer',
                'type': 'object',
                'properties': {
                    'type': {
                        'title': 'Type',
                        'default': 'A',
                        'enum': ['A', 'B'],
                        'type': 'string',
                    },
                },
            },
        },
    } (dict) len=6

    schema: {
        'title': 'TaskFunction2',
        'type': 'object',
        'properties': {
            'normalizer': {
                '$ref': '#/definitions/Normalizer',    # <--------- this is OK
            },
        },
        'additionalProperties': False,
        'definitions': {
            'Normalizer': {
                'title': 'Normalizer',
                'type': 'object',
                'properties': {
                    'type': {
                        'title': 'Type',
                        'default': 'A',
                        'enum': ['A', 'B'],
                        'type': 'string',
                    },
                },
            },
        },
    } (dict) len=5

    schema: {
        'title': 'TaskFunction3',
        'type': 'object',
        'properties': {
            'normalizer': {
                'title': 'Normalizer',
                'default': {
                    'type': 'A',
                },
                'allOf': [
                    {
                        '$ref': '#/definitions/Normalizer',      # <--------- this is NOT OK
                    },
                ],
            },
        },
        'additionalProperties': False,
        'definitions': {
            'Normalizer': {
                'title': 'Normalizer',
                'type': 'object',
                'properties': {
                    'type': {
                        'title': 'Type',
                        'default': 'A',
                        'enum': ['A', 'B'],
                        'type': 'string',
                    },
                },
            },
        },
    } (dict) len=5
tcompa commented 4 months ago

As discussed with @jluethi: let's go with the option mocked as

from pydantic import BaseModel

class Normalizer(BaseModel):
    type: Literal["A", "B"] = "A"

def task_function_2(normalizer: Optional[Normalizer] = None):
    if normalizer is None:
        normalizer = Normalizer()

with JSON Schema

{
  "title": "TaskFunction2",
  "type": "object",
  "properties": {
    "normalizer": {
      "$ref": "#/definitions/Normalizer"
    }
  },
  "additionalProperties": false,
  "definitions": {
    "Normalizer": {
      "title": "Normalizer",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "default": "A",
          "enum": [
            "A",
            "B"
          ],
          "type": "string"
        }
      }
    }
  }
}

The glitch described in https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/650 is still present:

which is problematic for at least two reasons:

It's highly counterintuitive, since the call signature says normalize: Optional[CellposeCustomNormalizer] = None
It may be plain wrong, in case we slightly modify the content of that if branch, e.g. to change some default.

We accept this trade-off for the moment, and perhaps we'll move towards option 3 after some fractal-web updates (https://github.com/fractal-analytics-platform/fractal-web/issues/413, https://github.com/fractal-analytics-platform/fractal-web/issues/412).

tcompa commented 4 months ago

Option 3 from https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/650#issuecomment-1935965918 is now also supported, as of fractal-web 0.9.2 (thanks @zonia3000!), therefore let's go with this one right away, since it's the "best" option (i.e. it doesn't have any known unintended effect - especially re: possible unexpected behaviors when the argument isn't provided).


The mocked version is then

from pydantic.decorator import ValidatedFunction
import json
from typing import Literal
from devtools import debug
from fractal_tasks_core.dev.lib_args_schemas import _validate_function_signature
from fractal_tasks_core.dev.lib_args_schemas import _remove_args_kwargs_properties
from fractal_tasks_core.dev.lib_args_schemas import _remove_pydantic_internals
from typing import Literal, Optional

from pydantic import BaseModel

class Normalizer(BaseModel):
    type: Literal["default", "custom"] = "default"

def task_function_3(normalizer: Normalizer = Normalizer()):
    pass

# GENERATE JSON SCHEMA
_validate_function_signature(task_function_3)
vf = ValidatedFunction(task_function_3, config=None)
schema = vf.model.schema()
schema = _remove_args_kwargs_properties(schema)
schema = _remove_pydantic_internals(schema)
print(json.dumps(schema, indent=2))

producing this schema

{
  "title": "TaskFunction3",
  "type": "object",
  "properties": {
    "normalizer": {
      "title": "Normalizer",
      "default": {
        "type": "default"
      },
      "allOf": [
        {
          "$ref": "#/definitions/Normalizer"
        }
      ]
    }
  },
  "additionalProperties": false,
  "definitions": {
    "Normalizer": {
      "title": "Normalizer",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "default": "default",
          "enum": [
            "default",
            "custom"
          ],
          "type": "string"
        }
      }
    }
  }
}

and this UI in fractal-web 0.9.2: image


While the actual model (untested!) should look like


class CellposeCustomNormalizerV2(BaseModel):
    """
    Validator to handle different normalization scenarios for Cellpose models

    If `type="default"`, then Cellpose default normalization is
    used and no other parameters can be specified.
    If `type="no_normalization"`, then no normalization is used and no
    other parameters can be specified.
    If `type="custom"`, then either percentiles or explicit integer
    bounds can be applied.

    Attributes:
        type:
            One of `default` (Cellpose default normalization), `custom`
            (using the other custom parameters) or `no_normalization`.
        lower_percentile: Specify a custom lower-bound percentile for rescaling
            as a float value between 0 and 100. Set to 1 to run the same as
            default). You can only specify percentiles or bounds, not both.
        upper_percentile: Specify a custom upper-bound percentile for rescaling
            as a float value between 0 and 100. Set to 99 to run the same as
            default, set to e.g. 99.99 if the default rescaling was too harsh.
            You can only specify percentiles or bounds, not both.
        lower_bound: Explicit lower bound value to rescale the image at.
            Needs to be an integer, e.g. 100.
            You can only specify percentiles or bounds, not both.
        upper_bound: Explicit upper bound value to rescale the image at.
            Needs to be an integer, e.g. 2000.
            You can only specify percentiles or bounds, not both.
    """
    type: Optional[Literal["default", "custom", "no_normalization"]] = None
    lower_percentile: Optional[float] = Field(None, ge=0, le=100)
    upper_percentile: Optional[float] = Field(None, ge=0, le=100)
    lower_bound: Optional[int] = None
    upper_bound: Optional[int] = None

    @root_validator
    def validate_conditions(cls, values):

        # Extract values
        type = values.get("type")
        lower_percentile = values.get("lower_percentile")
        upper_percentile = values.get("upper_percentile")
        lower_bound = values.get("lower_bound")
        upper_bound = values.get("upper_bound")

        # Verify that custom parameters are only provided when type="custom"
        if type != "custom":
            if lower_percentile is not None:
                raise ValueError(
                    f"Type='{type}' but {lower_percentile=}. "
                    "Hint: set type='custom'."
                    )
            if upper_percentile is not None:
                raise ValueError(
                    f"Type='{type}' but {upper_percentile=}. "
                    "Hint: set type='custom'."
                    )
            if lower_bound is not None:
                raise ValueError(
                    f"Type='{type}' but {lower_bound=}. "
                    "Hint: set type='custom'."
                )
            if upper_bound is not None:
                raise ValueError(
                    f"Type='{type}' but {upper_bound=}. "
                    "Hint: set type='custom'."
                )

        # The only valid options are:
        # 1. Both percentiles are set and both bounds are unset
        # 2. Both bounds are set and both percentiles are unset
        are_percentiles_set = (
            lower_percentile is not None,
            upper_percentile is not None,
            )
        are_bounds_set = (
            lower_bound is not None,
            upper_bound is not None,
            )
        if len(set(are_percentiles_set)) != 1:
            raise ValueError(
                "Both lower_percentile and upper_percentile must be set "
                "together."
                )
        if len(set(are_bounds_set)) != 1:
            raise ValueError(
                "Both lower_bound and upper_bound must be set together"
                )
        if lower_percentile is not None and lower_bound is not None:
            raise ValueError(
                "You cannot set both explicit bounds and percentile bounds "
                "at the same time. Hint: use only one of the two options."
            )

        return values
jluethi commented 4 months ago

This PR should be ready now. I switched to using Option 3 described above: Using the CellposeCustomNormalizer() as default input for the argument. Therefore, this version of the tasks will require Fractal web 0.9.2 (because of the allOf in the json schema).