Adding possible input inference

Reviving this old issue on input inference: We should revisit this in the context of the image list that could store metadata like available channels per Zarr, available labels & tables, number of pyramid levels if needed etc. Not something that will be built just yet, but we should consider the server knowing a specific handful of defined Models (e.g. a new version of ChannelInputModel) and provide a fancier interface to them based on the image list.

Original post: We've had the discussion about checking input types (see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/173). While coming up with the user stories for the web GUI, I noticed we'll want to do a second type of input checks: What are possible inputs to a task given the input dataset & the prior tasks?

Examples:

if a user needs to select a channel to process, we'd like to provide the user with a dropdown of potential options. For example, if 3 channels exist (DAPI, nanog, Lamin B1), we should populate the channel selection for e.g. a napari workflow with these 3 options
If a user wants to choose a label image, we only want to provide the label images that will be available at that point of the workflow. For example, if we have a workflow that creates a nuclei label image (and no others), a following measurement task should only provide nuclei as an option for the label image

Such selection is important to the web GUI to provide the user with relevant options, but could also be used to validate workflows we run from the command line (would e.g. catch typos in input strings).

For this to work though, the server needs to have a concept of what potential inputs would be. Those can come from 2 places: From the OME-Zarr file (when an OME-Zarr file is provided as input to a workflow) or from tasks that come before another task (e.g. the channel names from the parsing tasks to be used in the napari workflow task). If we have both, available inputs for any task would be the combination of metadata based on the input file & the inference from prior tasks.

OME-Zarr inference case The OME-Zarr inference won't be very common for our current workflows, but could come in when people use Fractal more flexibly or during multiplexing with more complex workflow setups (see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/199, e.g. for the case when a series of workflows run per resource and an analysis workflow is run later). We probably wouldn't want to parse the OME-Zarr file every time we want to do some inference (can be slow for many wells to cache all the metadata). Thus, we'd need to cache this (e.g. when a dataset is added) and have a good way to invalidate that cache when the file changes. That would mean having a metadata part to the dataset about available inputs.

Workflow-based inference We'll need a way that the task can check which potential inputs are generated by prior tasks. This probably needs to happen dynamically, because we want to know what resource is available for task 4, which is different from what task 7 will have available. How will we get this information from a task? May depend on the types of things we need to infer.

What type of inputs would we need to infer?

Channels (by name), either from the OME-Zarr file (when an OME-Zarr is provided as input) or from the parsing task
Label images (by name), either from the OME-Zarr file (when an OME-Zarr is provided as input) or from the segmentation tasks

Things we could consider inferring:

Available pyramid levels (nice to have, but not critical)
Available illumination correction files in a given folder (could just be the file list for a folder on disk)

The two main ones, channels & labels, are both input arguments to tasks (part of the channel list provided and the name provided as output for the segmentation task / for the napari workflow wrapper for the label image. Thus, they are already part of the workflow json, we "just" need a clear way of listing them.

Side-note; There is a potential to have workflow split into multiple parts, see e.g. the multiplexing discussion. In that case, there are 2 options: Maybe we can infer potential inputs across workflows, but that seems quite complicated. Alternatively, one can only do the full input inference for the later workflow once the first workflow has ended (=> inference from the OME-Zarr file for e.g. available channels). That should reduce complexity. Maybe we allow users to provide the inputs as strings in this case, but we can validate at the beginning of the second workflow whether those inputs will actually be available.

fractal-analytics-platform / fractal-tasks-core

Adding possible input inference #200