Closed tcompa closed 1 year ago
For the record, and assuming we proceed as described above:
If we add a default value to one of the internal variables (e.g. `wavelength_id: str = "something"), then this is correctly written into the JSON Schema, meaning that it could be available in fractal-web.
Note, however, that this will not be used by fractal-server when populating WorkflowTask.args
with the defaults in the JSON Schemas, since only the "first-level" defaults will be read and assigned to args
(ref https://github.com/fractal-analytics-platform/fractal-server/issues/639). This looks a very intuitive behavior to me, but let's make sure we all agree.
This approach has side effects: Pydantic becomes a primary dependency for the tasks (adding to the discussion in https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/375);
I guess that's a price that should be fine to pay, especially if we go the pydantic.v1 route once reasonable?
We will address attributes via channel.label, rather than channel["label"] (not a big deal, but propagating this into the fractal_tasks_core.lib subpackage could be annoying, because suddenly any other package using our lib will have to depend on pydantic at runtime).
I'd even call this a potential win, channel.label
seems like the better way to reference it.
We won't be able to fully exploit the flexibility of JSON Schemas (e.g. adding mutually exclusive class attributes, or other more complex definitions), or at least not easily. This does not actually depend on the Pydantic option discussed here, but it is related to our automatic creation of schemas.
Pity, that would be neat.
There may be a point in the future where we can move (or at least copy) some logical checks from the task Python function to the JSON Schemas of its arguments, but we'll have to decide where/how this operation is made (is it done by hand? is the relevant info stored somewhere?)
That would be great to make the tasks a bit simpler, i.e. having less input validation at the start.
Note, however, that this will not be used by fractal-server when populating WorkflowTask.args with the defaults in the JSON Schemas, since only the "first-level" defaults will be read and assigned to args (ref https://github.com/fractal-analytics-platform/fractal-server/issues/639). This looks a very intuitive behavior to me, but let's make sure we all agree.
Can we briefly discuss this tomorrow? Not sure I fully understand this and I'd be a big fan of the workflow that's downloaded actually containing all the args that were run (though if there's no way to change the defaults on the server-side anymore, the risks here do decrease a bit)
I'm generally OK with moving in this direction, but this also forces us to take a more strict choice out of the two following options:
Whoever will want to use the task as a Python function (outside Fractal) will have to create objects of the Channel
type, e.g. as in
from fractal_tasks_core.channels import Channel
from fractal_tasks_core.tasks.create_ome_zarr import create_ome_zarr
my_channels = [Channel(...something...)]
create_ome_zarr(..., my_channels, ...)
Since we are now saying that pydantic may become an actual dependency, we can switch from an optional coerce_and_validate=True/False
(which is currently part of the fractal_tasks_core.tasks._utils.run_fractal_task
function) to always coercing&validating arguments of tasks.
Up to now we were keeping pydantic on the run_fractal_task
side only, but if it enters the main function definition then I don't see compelling reasons for this any more.
Can we briefly discuss this tomorrow? Not sure I fully understand this and I'd be a big fan of the workflow that's downloaded actually containing all the args that were run (though if there's no way to change the defaults on the server-side anymore, the risks here do decrease a bit)
Sure, let's rediscuss it.
What I'm saying, briefly, is that if there is a default for a specific Channel attribute (e.g. start=0
) this is irrelevant for fractal-server, since it is not a default value for a function parameter, but it's a default value for an attribute of the type hint of a function parameter. This makes sense to me: fractal-server will never have to introduce an additional channel, but the user will have to do it (e.g. in fractal-web).
As of the call this morning, we'll proceed with this approach for Channel
. Some notes:
validate_allowed_channel_input
won't have to verify that wavelength_id
is defined, since it will be a required attribute of the class.Channel
?Channel.label
) won't propagate to the JSON Schema, for the moment. We can open another issue for that task.@jluethi feel free to mention other cases where this approach may be relevant, apart from Channel
.
Should the model name simply be Channel?
Given that the channel contains metadata that we put in the omero metadata and should be quite general, I'm happy with going with Channel here.
Description of attributes of internal models (e.g. the description of Channel.label) won't propagate to the JSON Schema, for the moment. We can open another issue for that task.
Ok, those would be useful to have as well eventually, let's track that separately.
@jluethi feel free to mention other cases where this approach may be relevant, apart from Channel.
I will keep an eye on this when reviewing the core tasks default values. One example that comes to mind: Could we make model_type
in the cellpose task now of models.MODEL_NAMES now => a defined list of valid models [maybe even something where we can describe the valid options for a potential drop-down selection, but that's a new feature wish]
Using models.MODEL_NAMES
was not possible with the pydantic approach we were using before, to be tested if it would work now.
I'll see if I find other examples.
For the record, here is the schema for schema['properties']['omero']['properties']['channels']['items']
from ome-ngff 0.4:
{
"type": "object",
"properties": {
"window": {
"type": "object",
"properties": {
"end": { "type": "number" },
"max": { "type": "number" },
"min": { "type": "number" },
"start": { "type": "number" }
},
"required": ["start", "min", "end", "max" ]
},
"label": { "type": "string" },
"family": { "type": "string" },
"color": { "type": "string" },
"active": { "type": "boolean" }
},
"required": [ "window", "color" ]
}
to which we are for sure adding some optional first-level properties like:
wavelength_id: str
coefficient: int = 1
inverted: bool = False
There is a (minor) decision to take, concerning the window
attribute.
The correct way to go is that we have a nested model, as in
class ChannelWindow(BaseModel):
min: str
max: str
start: str
end: str
class Channel(BaseModel):
wavelength_id: str
window: ChannelWindow
label: Optional[str]
...
which brings us closer to the ome-ngff schema for omero channels.
This choice, however, will break our current examples, because we introduced some additional first-level attributes like start
and end
that we later assigned to window["start"]
and window["end"]
.
That is, what we currently have as
{
...
"start": 0,
...
}
will need to become something like
{
"window": {"start": 0, ...}
}
I'm in favor of this change, to move our model closer to the specs, but it's not strictly needed - thus I can avoid introducing it if we are annoyed by the breaking change (@jluethi).
Some additional context:
What we are currently doing goes in the same direction of https://github.com/JaneliaSciComp/pydantic-ome-ngff (see for an example), which in my view is very relevant for long-term integration with ome-ngff through formal validation against known schemas (or Pydantic models).
In the current (very preliminary!) status of pydantic-ome-ngff:
omero and bioformats2raw are not currently supported, but contributions adding support for those models would be welcome.
That said, I think it'll be worth mentioning this example on that repo, at some point, to show our interest and possibly provide additional motivation. To be clear, I don't think we should depend on that package any time soon, but if it is transformed into an "official" encoding of the specs (we are not there yet), then I'd be quite interested in using it.
Some more additional context
With no need for comments, see this code block from https://github.com/czbiohub/iohub/blob/main/iohub/ngff_meta.py#L231
class WindowDict(TypedDict):
"""Dictionary of contrast limit settings"""
start: float
end: float
min: float
max: float
class ChannelMeta(MetaBase):
"""Channel display settings without clear documentation from the NGFF spec.
https://docs.openmicroscopy.org/omero/5.6.1/developers/Web/WebGateway.html#imgdata
"""
active: bool = False
coefficient: float = 1.0
color: ColorType = "FFFFFF"
family: str = "linear"
inverted: bool = False
label: str = None
window: WindowDict = None
wavelength_id: str coefficient: int = 1 inverted: bool = False
I'm aware of wavelength_id
. We don't really do anything with coefficient
(don't even know what this is) and inverted
, right?
I'm in favor of this change, to move our model closer to the specs, but it's not strictly needed - thus I can avoid introducing it if we are annoyed by the breaking change (@jluethi).
Now's the time for such breaking changes :) I'm also in favor of moving close to their model.
We can't fully adopt their spec for our input though: We will always calculate the correct min & max, never ask the user (they are based on whether the image is an uint8 or uint16. Thus, we will have a slightly different input spec than their definition for what we request as inputs. I'm also not foreseeing asking the user for coefficient
(whatever that actually does)
Other than that, big fan of moving closer to them and, in the far future, even depending on a package like that!
I'm aware of wavelength_id. We don't really do anything with coefficient (don't even know what this is) and inverted, right?
We have always been using these two default values (coefficient=1 and inverted=False), since the major channel refactor. They were also present in the first examples we discussed, and they are part of the transitional OME-NGFF omero metadata.
I can take them away, if there is any reason, or leave them as they are.
https://docs.openmicroscopy.org/omero/5.6.1/developers/Web/WebGateway.html#imgdata
We can't fully adopt their spec for our input though: We will always calculate the correct min & max, never ask the user
Yes, indeed. We will have to set them as optional, and then assign them inside our own logic. But that's a reasonable trade-off, IMO.
We can't fully adopt their spec for our input though: We will always calculate the correct min & max, never ask the user
Yes, indeed. We will have to set them as optional, and then assign them inside our own logic. But that's a reasonable trade-off, IMO.
This will also be the case for colormap
, which we are assigning as part of our logic (and not requiring from the user).
I can take them away, if there is any reason, or leave them as they are.
I'd leave them hard-coded until we figure out if we want to do something with them
Yes, indeed. We will have to set them as optional, and then assign them inside our own logic. But that's a reasonable trade-off, IMO.
Agreed, that's a good trade-off for these things! Same for colormap
!
Worth mentioning:
As of https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/410/commits/153fb061a2e38b369f1bc3a3af038ce1f838a0a1, we have a test for define_omero_channels
(the function that goes from a list of our-own custom Channel
objects to a standard JSON object to be printed in the zarr metatata), which directly validates the output against OME-NGFF JSON Schema.
This makes it clear, once again, that introducing a deeper integration of the OME-NGFF schemas will be a very interesting idea for the future.
Very nice!
Quick question: the channel.window
attribute is a required one, in ome-ngff.
Should we fail when we try processing an existing zarr file that does not comply (i.e. that doesn't have the window
attribute)?
Or should we also make window
an optional attribute?
(I guess the latter is true, or we will be a bit more strict in what we can process..)
As far as input to a fractal tasks are concerned, the window should be something users can optionally specify. They may specify a start & end (=> affects display of the image). They should likely never change the min & max (based on uint8 & uint16 etc.)
I thought the omero metadata supports not having the start & end specified. If they are not specified, the napari viewer just handles that as far as I know.
Thus, we only optionally ask start & end from user. We always provide a window but so far, this window always included min & max, while start & end were optional
Also, the omero metadata is labeled as transitional, see https://ngff.openmicroscopy.org/latest/index.html#omero-md
Thus, I would not expect that all OME-Zarrs we ever process will have an omero metadata. I wouldn't be too strict here thus.
As far as input to a fractal tasks are concerned, the window should be something users can optionally specify. They may specify a start & end (=> affects display of the image).
What about an existing zarr? Does it have to have windows specified?
EDIT: answer was in the second comment..
Thus, we only optionally ask start & end from user. We always provide a window but so far, this window always included min & max, while start & end were optional
Agreed. This is the current behavior (in the PR) for when we create a zarr ourselves.
Also, the omero metadata is labeled as transitional, see https://ngff.openmicroscopy.org/latest/index.html#omero-md Thus, I would not expect that all OME-Zarrs we ever process will have an omero metadata. I wouldn't be too strict here thus.
I'm now merging #410, but another caveat is due (and we can continue in a different issue if it requires a discussion) about how to handle existing OME-Zarrs.
Right now, in #410, I did use the new Channel
model in multiple places:
create_ome_zarr
and create_ome_zarr_multiplex
For case 1, it's OK to be as strict as possible (that is, a user cannot include arbitrary metadata in a newly created OME-Zarr), but this could be an issue for case 2. What should we do if we receive an OME-Zarr created outside Fractal, that includes some non-standard metadata?
With the current version of #410, all these metadata would be discarded upon reading from disk. This is not a severe issue at the moment, because we only read the channels metadata (in case 2) and never write them back to disk. It would become an issue if we have a task that also writes the channel metadata back to disk, because in that case we would be loosing some of the original non-standard metadata.
Good points.
There are 2 areas where this may become relevant: 1) In the context of taking arbitrary external OME-Zarr files as input => how do we parse their metadata correctly? Do our assumptions work in this broader context? https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/351 2) In the context of using the MD converter task => does that still work? I think it should. See https://github.com/jluethi/faim-hcs/tree/fractal-roi-tables
For the record, what was described in https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/386#issuecomment-1582477981 is now tested as part of fractal-demos, with create-ome-zarr arguments that look like
{
"allowed_channels": [
{
"color": "00FFFF", // renamed from `colormap` to `color`, as per specs
"wavelength_id": "A01_C01",
"label": "DAPI",
"window":{ // this nesting of `window` is new
"start": 110,
"end": 800
}
},
{
"color": "FF00FF",
"wavelength_id": "A01_C02",
"label": "nanog",
"window": {
"start": 110,
"end": 290
}
},
{
"color": "FFFF00",
"wavelength_id": "A02_C03",
"label": "Lamin B1",
"window": {
"start": 110,
"end": 1600
}
}
]
,
"coarsening_xy": 2,
"num_levels": 5,
"image_extension": "png"
}
Branching from https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/377.
For the moment I could only obtain something interesting via pydantic, while I got nowhere while trying with
dataclasses
or plain classes. A basic example would be this scriptthat prints this schema
Notes:
channel.label
, rather thanchannel["label"]
(not a big deal, but propagating this into thefractal_tasks_core.lib
subpackage could be annoying, because suddenly any other package using ourlib
will have to depend on pydantic at runtime).