Closed fstur closed 2 weeks ago
Thanks a lot for the PR @fstur ! Very useful functionality to have indeed.
As discussed today, @lorenzocerrone and I refactored the Cellpose input models to handle the normalization aspects better: Normalization can now be specified when selecting the channel. This needed a bit of reworking to make it work with the Pydantic models afterwards.
I also used this opportunity to move a bit of code out of the task logic into the new CellposeChannel1InputModel
.
Things that remain:
Advanced Cellpose Parameters
model to keep the task reasonably compactUnrelated (cc @tcompa ): I'm suggesting we disabled flake8 E203 checks, because this block has a flake8 vs. black conflict about whether there should be spaces around the colon:
x[channels[0] - 1 : channels[0]] = normalized_img(
x[channels[0] - 1 : channels[0]],
lower_p=normalize.lower_percentile,
upper_p=normalize.upper_percentile,
lower_bound=normalize.lower_bound,
upper_bound=normalize.upper_bound,
)
It's also what made the checks above fail before.
I also added an exception for flake8 W503, which conflicts with black in how to format the cellpose tests. (unrelated: do we want to consider moving to ruff linting?)
Unrelated (cc @tcompa ): I'm suggesting we disabled flake8 E203 [...]
Agreed. This handling of :
also annoys me, on top of often requiring # noqa
comments.
(unrelated: do we want to consider moving to ruff linting?)
Sure, nothing against it. It's not like pre-commit takes very long (time poetry run pre-commit run --all-files
takes ~1.7 seconds on my machine, which includes reordering imports, black, flake8 and bandit), but of course a speedup wouldn't hurt.
Once we have it in this repo, let's also make sure we adopt it in fractal-server and fractal-tasks-core - to keep them consistent.
Hey @fstur
I finished my work on this PR. It now has the channel & normalization input combined, plus the whole normalization logic moved to its own, tests helper function (see _normalize_cellpose_channels
).
I also modified the advanced model inputs into a CellposeModelParams
object that can be passed to the task, so the logic of the task input changes a bit both on the Python side as well as the Fractal web side.
Can you check whether this version now works for you? I'd be happy with merging this into fractal tasks core & then providing the normalization approach for both channels.
@tcompa Can you review this PR?
One of the slightly weird things I'm doing is making channel2
optional in the task signature to avoid it being expanded in Fractal web. Because the whole block is already quite big.
Fractal web always initializes an object for it though, because the Normalizer has default values => I'll open a separate issue to discuss this further.
@lorenzocerrone If you also want to have a look, open for further feedback :) But not a requirement that you review the PR.
One of the slightly weird things I'm doing is making channel2 optional in the task signature to avoid it being expanded in Fractal web
This comment is not relevant any more as of https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/738/commits/b63547f5c0295d812c10d5359053d561da6020c5, right?
This comment is not relevant any more as of https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/b63547f5c0295d812c10d5359053d561da6020c5, right?
Right! Now I made the normalize part of channel have a default. Would be great if you could review if the use of Field with default_factory mode is a good approach for this
Re: channel2
type hint and JSON Schema
I could not find a better way for making channel2
fully optional (as in "it is set to null
in fractal-web
UI unless at lease one of its attribute was set explicitly by the user"). Thus I'd proceed as in the PR: the is_set()
method is quite clear in its meaning, and it doesn't introduce a lot of complexity.
Higher-level: I agree about
Fractal web always initializes an object for it though, because the Normalizer has default values
and this is what blocks us towards a clean version where channel2
is fully optional. Note that this issue is at least partly a consequence of fractal-web
logic: when working with a CLI client we wouldn't have issues in setting a None
default for an Optional
function argument.
I'll open a separate issue to discuss this further.
I think it's worth continuing this discussion (in a different issue, indeed), because it sounds like a possibly frequent pattern. Note that the outcome in principle could also look like "refactor the CellposeCustomNormalizer
model so that it has no default value", since this would likely solve the whole issue.
[continues below, concerning default_factory
]
Re: default_factory
Using arg: Field(default_factory=Model)
rather than arg: Model = Model()
has a difference, but I think it is not relevant for this case. Note that the latter is the one we are currently using in main
for
https://github.com/fractal-analytics-platform/fractal-tasks-core/blob/00cbfdfc062d6c615cef073b29d4628224903fa0/fractal_tasks_core/tasks/cellpose_segmentation.py#L214
The main difference is that the produced JSON Schemas are different. In the example below, I reproduce a minimal example with two function arguments (one always using default_factory
, and the other using the same Model()
option):
Python
from typing import Optional, Literal
from pydantic import BaseModel, Field
class Normalizer(BaseModel):
"""
Description of Normalizer
Attributes:
type: Description of type
time: Description of time
"""
type: Literal["option_a", "option_b"] = "option_a"
class MyModelDefaultFactory(BaseModel):
"""
Description of ChannelModel
Attributes:
normalizer: Description of normalizer.
time: Description of time.
attr: Description of attr1.
"""
normalizer: Normalizer = Field(default_factory=Normalizer)
time: float = Field(default_factory=time.perf_counter)
attr: Optional[str] = None
class MyModelObjectInit(BaseModel):
"""
Description of ChannelModel
Attributes:
normalizer: Description of normalizer.
time: Description of time.
attr: Description of attr1.
"""
normalizer: Normalizer = Normalizer()
time: float = time.perf_counter()
attr: Optional[str] = None
@validate_arguments
def function(
arg_default_factory: MyModelDefaultFactory = Field(default_factory=MyModelDefaultFactory),
arg2: MyModelObjectInit = MyModelObjectInit(),
):
"""
Description of function
Args:
arg_default_factory: Description of arg_default_factory
arg2: Description of arg2
"""
pass
which leads to
{
"title": "Function",
"type": "object",
"properties": {
"arg_default_factory": {
"$ref": "#/definitions/MyModelDefaultFactory",
"title": "Arg_Default_Factory",
"description": "Description of arg_default_factory"
},
"arg2": {
"title": "Arg2",
"default": {
"normalizer": {
"type": "option_a"
},
"time": 24457.131491715,
"attr": null
},
"allOf": [
{
"$ref": "#/definitions/MyModelObjectInit"
}
],
"description": "Description of arg2"
}
},
"additionalProperties": false,
"definitions": {
"Normalizer": {
"title": "Normalizer",
"description": "Description of Normalizer",
"type": "object",
"properties": {
"type": {
"title": "Type",
"default": "option_a",
"enum": [
"option_a",
"option_b"
],
"type": "string"
}
}
},
"MyModelDefaultFactory": {
"title": "MyModelDefaultFactory",
"description": "Description of ChannelModel",
"type": "object",
"properties": {
"normalizer": {
"$ref": "#/definitions/Normalizer",
"title": "Normalizer"
},
"time": {
"title": "Time",
"type": "number"
},
"attr": {
"title": "Attr",
"type": "string"
}
}
},
"MyModelObjectInit": {
"title": "MyModelObjectInit",
"description": "Description of ChannelModel",
"type": "object",
"properties": {
"normalizer": {
"title": "Normalizer",
"default": {
"type": "option_a"
},
"allOf": [
{
"$ref": "#/definitions/Normalizer"
}
]
},
"time": {
"title": "Time",
"default": 24457.131491715,
"type": "number"
},
"attr": {
"title": "Attr",
"type": "string"
}
}
}
}
}
The main difference here is the presence of
"default": {
"normalizer": {
"type": "option_a"
},
"time": 24457.131491715,
"attr": null
},
for arg2
, while it is missing when we use default_factory
.
Part of the explanation is that default_factory
should not be filled upon schema generation (see e.g. https://github.com/pydantic/pydantic/issues/1520), and the time
attribute that I include above helps to show this behavior (it is present when it's called explicitly, but not when using a default_factory=time.perf_counter
).
fractal-web
?In the fractal-web UI, this difference appears as irrelevant. If I don't modify any form field, current data are identical for the two cases (apart from the time
attribute, but that is expected):
fractal-server
?For top-level arguments (that is, this does not concern nested attribute of complex arguments), the presence/absence of a default
in the JSON Schema does matter in fractal-server. If a default
value is present, then it is used as a basis during any insert-worfklowtask
or patch-workflowtask
operation.
Considering the example above:
fractal-web
, there will always be a client-provided value (and therefore the default is not relevant).default_factory
will have no default in the arguments JSON file, and the default will be generated at runtime via the default_factory
.I cannot find any issue with either option, right now, but I wouldn't be surprised if we eventually identify some edge case of one or the other.
[continues with a TL;DR]
TL;DR re: default_factory
default_factory
is that we are not using a mutable function argument (which is a bad practice, but irrelevant for the standard command-line usage of fractal tasks)arg: MyModel = MyModel()
is that the default
field in the JSON Schema is set, making it more transparent.fractal-web
, for my dummy example and also for the current PR.If I were to choose, I would replace
channel2: CellposeChannel2InputModel = Field(default_factory=CellposeChannel2InputModel)
with
channel2: CellposeChannel2InputModel = CellposeChannel2InputModel()
which creates a more transparent JSON Schema that includes
- "$ref": "#/definitions/CellposeChannel2InputModel",
"title": "Channel2",
+ "default": {
+ "wavelength_id": null,
+ "label": null,
+ "normalize": {
+ "type": "default",
+ "lower_percentile": null,
+ "upper_percentile": null,
+ "lower_bound": null,
+ "upper_bound": null
+ }
+ },
+ "allOf": [
+ {
+ "$ref": "#/definitions/CellposeChannel2InputModel"
+ }
+ ],
but this is just personal preference and I have no specific reasons for it.
This closes my feedback on the arguments/schema-related part of the PR.
I can still review the rest of the PR, if it's useful, but that will be a bit later.
Thanks for the review & the reasoning on trade-offs @tcompa . We'll stay with this version for the time being and will evaluate later whether we'll have a strong opinion one way or the other for Fractal tasks.
Thanks @fstur for starting this work, I'm now merging this PR so we have the updated version of the Cellpose task with easier input handling & normalize options for both channels available in main :)
Checklist before merging
CHANGELOG.md
The cellpose_segmentation task currently accepts a custom normalization option. The problem is, that if two channels are used for the cellpose segmentation, both channels are normalized with the same
CellposeCustomNormalizer
, but one might want to normalize them with different parameters. I added an optionalnormalize2
argument tocellpose_segmentation
, and updatedsegement_ROI
accordingly. No tests are implemented yet.