Closed tcompa closed 9 months ago
I realized that it's not so easy to comply with our table specs V1 for feature tables created with the napari-workflows task.
In the specs, we require this kind of attibutes:
"type": "feature_table",
"region": { "path": "../labels/label_DAPI" },
"instance_key": "label",
where region/path
points to the label image where measurements are being performed.
This information is not directly available in the napari-workflows-wrapper task, because it's defined as part of the workflow file.
As an example we may have this workflow
!!python/object:napari_workflows._workflow.Workflow
_tasks:
regionprops_DAPI: !!python/tuple
- !!python/name:napari_skimage_regionprops._regionprops.regionprops_table ''
- dapi_img
- dapi_label_img
- true
- true
- false
- false
- false
- false
combined with the following arguments of the fractal task
# Prepare parameters for second napari-workflows task (measurement)
workflow_file = str(testdata_path / "napari_workflows/wf_4.yaml")
input_specs = {
"dapi_img": {"type": "image", "channel": {"wavelength_id": "A01_C01"}}, # type: ignore # noqa
"dapi_label_img": {"type": "label", "label_name": "label_DAPI"}, # type: ignore # noqa
}
output_specs = {
"regionprops_DAPI": { # type: ignore # noqa
"type": "dataframe",
"table_name": "regionprops_DAPI",
},
}
region["path"]
by direct (AKA human) inspectionBy looking at these two snippets, and thanks to our previous context knowledge, we know that the relevant workflow input is dapi_label_img
, while dapi_img
is the intensity image. Then we can (by hand) make a connection between the workflow output regionprops_DAPI
and the workflow input dapi_label_img
. Finally, we can use the input_specs
attribute, and learn that the correct value for region["path"]
is "../labels/label_DAPI"
.
region["path"]
We may want to try to mimic the direct inspection in an automated way (even if we may have to handle some complex edge cases), but there is one step where I don't know how to proceed, namely this one:
we know that the relevant workflow input is
dapi_label_img
How could a task be able to guess this information is unclear to me.
cc @jluethi
The only reliable way I can guess is that we ask for the value of region["path"]
as part of the output_specs, when type="dataframe"
. Then the new task arguments would be
# Prepare parameters for second napari-workflows task (measurement)
workflow_file = str(testdata_path / "napari_workflows/wf_4.yaml")
input_specs = {
"dapi_img": {"type": "image", "channel": {"wavelength_id": "A01_C01"}}, # type: ignore # noqa
"dapi_label_img": {"type": "label", "label_name": "label_DAPI"}, # type: ignore # noqa
}
output_specs = {
"regionprops_DAPI": { # type: ignore # noqa
"type": "dataframe",
"table_name": "regionprops_DAPI",
"region_path": "label_DAPI", # <-------------- new attribute
},
}
and it would be up to the user to provide the correct value.
The other option is that this task does not comply with our specs, at least for the moment. This is a bit unfortunate, because it's our only task that generates feature tables, and then the feature-table specs are not really relevant and we could likely drop them.
We can also modify the specs, so that region
is not required, and then the task complies with them.
The downside here is that the new feature tables would lack the required information to link back to another table (the one with labels), and then they would end up being little more than a standard table (e.g. they'd just have an additional instance_key="label"
attribute, and an additional obs
attribute listing all labels).
This gets even more complicated for the napari workflow case, because the workflow itself may be creating the label image that measurements are made on.
To be pragmatic, I'd say we go with Option A for the moment and include help text for the region_path
: "name of the label images that the feature measurements are based on"
That just gets added to the metadata, so we create example OME-Zarrs that are compliant with our spec with our example workflows.
Just for the record, some inference would be possible:
we know that the relevant workflow input is dapi_label_img
Our inputs & outputs have types, so we could try some inference on "type": "label" in the input & output specs. But there is the risk that e.g. the workflow loads a label image, modifies it and then makes measurements. Or that a label image is created by the workflow, but not stored as an output (I'd not recommend either, but it's possible in this flexible workflow setup).
There are some fancier scenarios we could consider: we can optionally ask the user to add a region_path
to the output specs and otherwise try inference based on "type": "label" in input & output spec (with the priority order being: 1) user-specified, 2) output spec, 3) input_spec). I don't think it's currently worth adding that complexity.
For me, the main question is how much time we want to invest into our napari workflow wrapper. Given some uncertainty over the stability and continued investment into napari workflows, I'd limit it for the time being => just go with Option A
then the feature-table specs are not really relevant and we could likely drop them.
I still think the spec is an overall good idea. And it should get used by the scmultiplex measurements, which are much more straightforward: Based on a label image, create measurements and save them to a table. Those are what I currently use as a default measurement task and we can make them spec compliant.
Plus, this spec will enable interesting downstream functionality in napari plugins to automatically associate measurements to the correct label layer, which would be very useful.
To be pragmatic, I'd say we go with Option A for the moment and include help text for the region_path: "name of the label images that the feature measurements are based on"
Any preference between the following two options?
region_path
attribute, with the role we just described. This has the pro of offering flexibility, rather than always enforcing the structure "../labels/{label_name}"
. label_name
attribute, which already exists both in NapariWorkflowsInput
and NapariWorkflowsOutput
I'd rather go with 2, because it offers an intuitive way of setting that parameter in the most common cases:
label_name
attributes (the one for the label
output and the one for the dataframe
one) are the samelabel_name
which is identical to the one to be usedExamples (within option 2):
# Label already exists
input_specs = {
"dapi_img": {
"type": "image",
"channel": {
"wavelength_id": "A01_C01"
}
},
"dapi_label_img": {
"type": "label",
"label_name": "label_DAPI"
},
}
output_specs = {
"regionprops_DAPI": {
"type": "dataframe",
"table_name": "regionprops_DAPI",
"label_name": "label_DAPI",
},
}
# Label is computed within the same workflow (warning: I did not test this)
input_specs = {
"input": {
"type": "image",
"channel": {
"wavelength_id": "A01_C01"
}
}
}
output_specs = {
"Result of Expand labels (scikit-image, nsbatwm)": {
"type": "label",
"label_name": "label_DAPI",
},
"regionprops_DAPI": {
"type": "dataframe",
"table_name": "regionprops_DAPI",
"label_name": "label_DAPI",
},
}
Option 2 is currently implemented through https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/7043d64a9c0bce799ddf237e4bb42828acd866ce. The relevant bit of the task looks like this
label_name
sounds good to me :)
When producing feature tables from the napari-workflows-wrapper task, we are currently not complying with our own feature-table specifications, which are being defined as part of #582.
We currently write a measurement table into a Zarr group as:
without including any
table_attrs
. We should start to also use the attributes that were proposed in https://github.com/ome/ngff/pull/64 (e.g.type
,region
,instance_key
).