fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
12 stars 6 forks source link

Adding possible input inference #200

Open jluethi opened 1 year ago

jluethi commented 1 year ago

Reviving this old issue on input inference: We should revisit this in the context of the image list that could store metadata like available channels per Zarr, available labels & tables, number of pyramid levels if needed etc. Not something that will be built just yet, but we should consider the server knowing a specific handful of defined Models (e.g. a new version of ChannelInputModel) and provide a fancier interface to them based on the image list.


Original post: We've had the discussion about checking input types (see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/173). While coming up with the user stories for the web GUI, I noticed we'll want to do a second type of input checks: What are possible inputs to a task given the input dataset & the prior tasks?

Examples:

Such selection is important to the web GUI to provide the user with relevant options, but could also be used to validate workflows we run from the command line (would e.g. catch typos in input strings).

For this to work though, the server needs to have a concept of what potential inputs would be. Those can come from 2 places: From the OME-Zarr file (when an OME-Zarr file is provided as input to a workflow) or from tasks that come before another task (e.g. the channel names from the parsing tasks to be used in the napari workflow task). If we have both, available inputs for any task would be the combination of metadata based on the input file & the inference from prior tasks.

OME-Zarr inference case The OME-Zarr inference won't be very common for our current workflows, but could come in when people use Fractal more flexibly or during multiplexing with more complex workflow setups (see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/199, e.g. for the case when a series of workflows run per resource and an analysis workflow is run later). We probably wouldn't want to parse the OME-Zarr file every time we want to do some inference (can be slow for many wells to cache all the metadata). Thus, we'd need to cache this (e.g. when a dataset is added) and have a good way to invalidate that cache when the file changes. That would mean having a metadata part to the dataset about available inputs.

Workflow-based inference We'll need a way that the task can check which potential inputs are generated by prior tasks. This probably needs to happen dynamically, because we want to know what resource is available for task 4, which is different from what task 7 will have available. How will we get this information from a task? May depend on the types of things we need to infer.


What type of inputs would we need to infer?

Things we could consider inferring:

The two main ones, channels & labels, are both input arguments to tasks (part of the channel list provided and the name provided as output for the segmentation task / for the napari workflow wrapper for the label image. Thus, they are already part of the workflow json, we "just" need a clear way of listing them.


Side-note; There is a potential to have workflow split into multiple parts, see e.g. the multiplexing discussion. In that case, there are 2 options: Maybe we can infer potential inputs across workflows, but that seems quite complicated. Alternatively, one can only do the full input inference for the later workflow once the first workflow has ended (=> inference from the OME-Zarr file for e.g. available channels). That should reduce complexity. Maybe we allow users to provide the inputs as strings in this case, but we can validate at the beginning of the second workflow whether those inputs will actually be available.

jluethi commented 2 months ago

Reviving this old issue on input inference: We should revisit this in the context of the image list that could store metadata like available channels per Zarr, available labels & tables, number of pyramid levels if needed etc. Not something that will be built just yet, but we should consider the server knowing a specific handful of defined Models (e.g. a new version of ChannelInputModel) and provide a fancier interface to them based on the image list.