Closed tcompa closed 7 months ago
Having this list in a dict with the only key image_list_updates is there so we know whether a task provided an image_list_updates or a parallelization_list I assume?
Rather than the difference between image-list-updates and parallelization-list, returning a top-level dictionary is necessary to distinguish between image_list_updates
, image_list_removals
and filters
. Agreed: when only one of those is set, the rich structure appears cumbersome.
As a by-product, returning a top-level dictionary also fits well with validation of the task metadata output through Pydantic models, but this is not a hard constraint - we can customize this validation logic at will to avoid Pydantic if we have a reason. An example to make this clear i that e.g. for init tasks we could in principle move from
return dict(parallelization_list=[...])
to
return [...]
and this would not be too hard to validate server-side, even if it's not vanilla Pydantic.
What we don't have an alternative for is the case of a parallel task (or a non-parallel standalone one), where we would need to replace
return dict(
image_list_updates=[...],
image_list_removals=[...],
filters={...},
)
with something else.
Ah right, thanks for the details on this!
Conclusion: the image_list_updates stay as they are for good reason. They are also the ones typically provided by task developers.
Whether we should simplify the parallelization_lists: May be nice. Can we still robustly detect the output and validate it server-side? e.g. avoid confusion between an init_task and a non-parallel task output. In general, the parallelization list felt less cumbersome to build in tasks so far, because we're actually putting many items in there. So the dictionary didn't feel like too much red tape there
One thought that came up while working on the Import OME-Zarr task: What's the name of return object that can contain image_list_updates
, image_list_removals
and filters
? I have been thinking about it as image_list_updates
, but that's very confusing ;)
What is the overarching term for this dict that a typical task returns? image_list_changes? Any good ideas for naming here?
One thought that came up while working on the Import OME-Zarr task: What's the name of return object that can contain
image_list_updates
,image_list_removals
andfilters
? I have been thinking about it asimage_list_updates
, but that's very confusing ;)What is the overarching term for this dict that a typical task returns? image_list_changes? Any good ideas for naming here?
Within fractal-server, we typically call it task_output
, and the corresponding Pydantic models are named TaskOutput
and InitTaskOutput
. On the server side, there is no ambiguity between the actual task data output (on-disk) and the task metadata output (the one that goes back to Fractal).
On the tasks side, "metadata" is a bit ambiguous as in "fractal metadata" vs "NGFF metadata". Therefore we could maybe use names that includes "fractal". Verbose options include fractal_metadata
, fractal_metadata_output
, fractal_output
, ..
Whether we should simplify the parallelization_lists: May be nice. Can we still robustly detect the output and validate it server-side? e.g. avoid confusion between an init_task and a non-parallel task output. In general, the parallelization list felt less cumbersome to build in tasks so far, because we're actually putting many items in there. So the dictionary didn't feel like too much red tape there
For the record: this switch (from return {"parallelization_list": [...]}
to return [...]
) is not implemented server-side. Within #671 I am reverting the output metadata of init tasks to the currently-supported structure. We can later decide whether to change the server&tasks behavior to avoid the need of writing "parallelization_list".
Let's go with task_output
as the convention for how we call this
Originally posted by @jluethi in https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/671#issuecomment-2037752213