Closed tcompa closed 1 year ago
Currently, adding tasks by name is only supported if there is only a single version of them installed. We should add support for adding tasks by name & version. And if multiple versions of a task are present + now version is specified, default to the newest version available on the server.
Quoting https://github.com/fractal-analytics-platform/fractal/issues/309#issuecomment-1324792116:
This feature can be included somewhat easily, if we restrict its scope to a subset of all possible tasks - namely the ones coming from a pypi install of fractal-tasks-core package. This would mean that:
This discussion is progressing in https://github.com/fractal-analytics-platform/fractal-server/issues/1#issuecomment-1403501673 (or in a dedicated issue, soon enough), and will affect this issue on the multiple-installed-versions use cases.
Just stumbled across this again, tagging it so that we consider it a priority after the top 3 discussed in our call. Will be very helpful to have different versions of the same task installed, but still accessible on a name-basis. Default to newest version available would be nice here. Let's pick up on this again :)
An example of why this is important: @adrtsc started to add custom tasks. The server accepts tasks with non-unique names. But then, the client-side task adding fails, because the caching can't happen client-side with the non-unique names.
Here's an example error, so this issue becomes findable for someone looking for it
WARNING:root:Cannot write task-list cache if task names are not unique (version-based disambiguation will be added in the future).
Current task list includes: ['Create OME-Zarr structure', 'Convert Yokogawa to OME-Zarr', 'Copy OME-Zarr structure', 'Maximum Intensity Projection', 'Cellpose Segmentation', 'Illumination correction', 'Napari workflows wrapper', 'Create OME-ZARR structure (multiplexing)', 'my custom task 1', 'my custom task custom_task', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_0', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_1', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_2', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_3', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_4', 'Cellpose Segmentation Custom', 'Cellpose Segmentation Custom', 'Cellpose Segmentation Custom', 'Cellpose Segmentation Custom', 'test']
Traceback (most recent call last):
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/cmd/_workflow.py", line 101, in workflow_add_task
task_id = int(task_id_or_name)
ValueError: invalid literal for int() with base 10: 'Create OME-Zarr structure'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/bin/fractal", line 8, in <module>
sys.exit(run())
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/__main__.py", line 7, in run
asyncio.run(main())
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/client.py", line 114, in main
interface = await handle()
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/client.py", line 104, in handle
interface = await handler(client, **kwargs)
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/cmd/__init__.py", line 151, in workflow
iface = await workflow_add_task(client, batch=batch, **kwargs)
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/cmd/_workflow.py", line 103, in workflow_add_task
task_id = await get_cached_task_by_name(task_id_or_name, client)
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/cmd/utils.py", line 26, in get_cached_task_by_name
with cache_file.open("r") as f:
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/pathlib.py", line 1241, in open
return io.open(self, mode, buffering, encoding, errors, newline,
File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/pathlib.py", line 1109, in _opener
return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/data/homes/atschan/fractal-demos/examples/08_custom_cellpose_task/.cache/tasks'
Also account for extras (see pytorch discussion)?
This issue requires:
or an equivalent update to fractal-server.
Once such an update is available, the task list obtained from a API call would be a list of items with the following details:
some_task["owner"] is None
);source
attribute which is rich in metadata. The ones we would care about, for now, are the package name, the package version and the task name (or slug).With this information, we should prepare a set of relevant use cases of how to address a task, e.g.:
Regarding this comment by @jluethi:
An example of why this is important: ... started to add custom tasks. The server accepts tasks with non-unique names. But then, the client-side task adding fails, because the caching can't happen client-side with the non-unique names.
Could we define the expected behavior? This is a custom task, and (in my understanding) we are not planning to add a version to it. How would we want the user to create multiple tasks with the same name and be able to specify which one they'd like to address? In principle we can also add an actual version to each task (custom or common), but then let's review https://github.com/fractal-analytics-platform/fractal-server/issues/702 (especially in its "Re: why a string and not many db columns?" section).
Some updates from recent work on fractal-server:
owner
attribute.version
attribute. Note that for automatically-collected tasks this attribute is always set.Plan of changes:
get_cached_task_by_name
from _aux_task_caching.py
_fetch_task_list
, only keep a few task attributes: id, name, version, owner, source
_get_task_id
into _get_matching_tasks
, without any raise
get_task_id_from_cache
-- see description belowpatch_task
functionpost_workflowtask
fractal task edit
fractal workflow add-task
test_task_cache_with_non_unique_names
, after updating it)Note: this requires #496.
New interface of task edit
:
fractal task edit TASK_ID_OR_NAME --version VERSION [...]
New interface of workflow add-task
:
fractal workflow add-task PROJECT_ID WORKFLOW_ID TASK_ID_OR_NAME --version VERSION [...]
Draft of the get_task_id_from_cache
function (discussed with @ychiucco):
def _search_in_task_list(allow_cache_refresh)
call _get_matching_tasks with NAME (required) and VERSION (optional)
if 0 results (BAD THING):
if allow_cache_refresh:
task_list = refresh_task_cache
return _search_in_task_list(..., allow_cache_refresh=False)
else
fail with very informative message
if 1 result:
return its ID
if N>1 results:
if version is None:
set version = max(versions)
check whether there is only one task left
if yes:
return its ID
if not:
HANDLE AS "BAD THING" -- SEE ABOVE
else:
# e.g. two "test" tasks, with version 0.0.1 but different `owner`s
HANDLE AS "BAD THING" -- SEE ABOVE
----
get_task_id_from_cache(TASK_ID_OR_NAME, VERSION = None):
if TASK_ID_OR_NAME is an integer:
if VERSION:
raise
return ID
else:
cache_is_up_to_date = False
if cache is not there:
refresh task_list and write it to cache
cache_is_up_to_date = True
task_id = _search_in_task_list(allow_cache_refresh=True)
return task_id
Here are some possible error messages coming from requests which cannot be fulfilled (as in the current state of https://github.com/fractal-analytics-platform/fractal/pull/499):
There is no task with name "dummy0" in the following task list:
ID, Name, Version, Owner, Source
101, "dummy1", 1.0.1, None, a
201, "dummy2", None, None, b
202, "dummy2", 2.0.0, None, c
301, "dummy3", 3.0.0, None, d
302, "dummy3", 3.1.4, None, e
401, "dummy4", 4.0.0, None, f
402, "dummy4", 4.1.1, None, g
401, "dummy4", 4.1.1, None, h
There is no task with (name, version)=("dummy1", 3.1.4) in the following task list:
ID, Name, Version, Owner, Source
101, "dummy1", 1.0.1, None, a
201, "dummy2", None, None, b
202, "dummy2", 2.0.0, None, c
301, "dummy3", 3.0.0, None, d
302, "dummy3", 3.1.4, None, e
401, "dummy4", 4.0.0, None, f
402, "dummy4", 4.1.1, None, g
401, "dummy4", 4.1.1, None, h
Cannot determine the latest version in the following task list:
ID, Name, Version, Owner, Source
201, "dummy2", None, None, b
202, "dummy2", 2.0.0, None, c
Please make your request more specific.
Multiple tasks with latest version (4.1.1) in the following task list:
ID, Name, Version, Owner, Source
402, "dummy4", 4.1.1, None, g
401, "dummy4", 4.1.1, None, h
Please make your request more specific.
Multiple tasks with latest version (4.1.1) in the following task list:
ID, Name, Version, Owner, Source
402, "dummy4", 4.1.1, None, g
401, "dummy4", 4.1.1, None, h
Please make your request more specific.
Multiple tasks with version 4.1.1 in the following task list:
ID, Name, Version, Owner, Source
402, "dummy4", 4.1.1, None, g
401, "dummy4", 4.1.1, None, h
Please make your request more specific.
Nice error messages, thanks @tcompa !
Quoting from #309:
See mainly https://github.com/fractal-analytics-platform/fractal/issues/309#issuecomment-1324792116 and following comments.