fractal-analytics-platform / fractal-client

Command-line client for Fractal
https://fractal-analytics-platform.github.io/fractal-client
BSD 3-Clause "New" or "Revised" License
45 stars 1 forks source link

Generalize task cache to also be aware of task source (or version) for some specific tasks #345

Closed tcompa closed 1 year ago

tcompa commented 1 year ago

Quoting from #309:

The limited-scope of this feature is available via #341. If there are no name clashes (e.g. if a single version of tasks is installed), addressing tasks by name should now work also in fractal workflow add-task. If there are non-unique names, the cache is never written and only IDs can be used.

See mainly https://github.com/fractal-analytics-platform/fractal/issues/309#issuecomment-1324792116 and following comments.

jluethi commented 1 year ago

Currently, adding tasks by name is only supported if there is only a single version of them installed. We should add support for adding tasks by name & version. And if multiple versions of a task are present + now version is specified, default to the newest version available on the server.

tcompa commented 1 year ago

Quoting https://github.com/fractal-analytics-platform/fractal/issues/309#issuecomment-1324792116:

This feature can be included somewhat easily, if we restrict its scope to a subset of all possible tasks - namely the ones coming from a pypi install of fractal-tasks-core package. This would mean that:

This discussion is progressing in https://github.com/fractal-analytics-platform/fractal-server/issues/1#issuecomment-1403501673 (or in a dedicated issue, soon enough), and will affect this issue on the multiple-installed-versions use cases.

jluethi commented 1 year ago

Just stumbled across this again, tagging it so that we consider it a priority after the top 3 discussed in our call. Will be very helpful to have different versions of the same task installed, but still accessible on a name-basis. Default to newest version available would be nice here. Let's pick up on this again :)

jluethi commented 1 year ago

An example of why this is important: @adrtsc started to add custom tasks. The server accepts tasks with non-unique names. But then, the client-side task adding fails, because the caching can't happen client-side with the non-unique names.

Here's an example error, so this issue becomes findable for someone looking for it

WARNING:root:Cannot write task-list cache if task names are not unique (version-based disambiguation will be added in the future).
Current task list includes: ['Create OME-Zarr structure', 'Convert Yokogawa to OME-Zarr', 'Copy OME-Zarr structure', 'Maximum Intensity Projection', 'Cellpose Segmentation', 'Illumination correction', 'Napari workflows wrapper', 'Create OME-ZARR structure (multiplexing)', 'my custom task 1', 'my custom task custom_task', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_0', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_1', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_2', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_3', 'my custom task 20230306_apex2_validation_protease_3D_fractal_custom_cellpose_4', 'Cellpose Segmentation Custom', 'Cellpose Segmentation Custom', 'Cellpose Segmentation Custom', 'Cellpose Segmentation Custom', 'test']
Traceback (most recent call last):
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/cmd/_workflow.py", line 101, in workflow_add_task
    task_id = int(task_id_or_name)
ValueError: invalid literal for int() with base 10: 'Create OME-Zarr structure'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/bin/fractal", line 8, in <module>
    sys.exit(run())
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/__main__.py", line 7, in run
    asyncio.run(main())
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/asyncio/base_events.py", line 642, in run_until_complete
    return future.result()
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/client.py", line 114, in main
    interface = await handle()
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/client.py", line 104, in handle
    interface = await handler(client, **kwargs)
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/cmd/__init__.py", line 151, in workflow
    iface = await workflow_add_task(client, batch=batch, **kwargs)
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/cmd/_workflow.py", line 103, in workflow_add_task
    task_id = await get_cached_task_by_name(task_id_or_name, client)
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/site-packages/fractal/cmd/utils.py", line 26, in get_cached_task_by_name
    with cache_file.open("r") as f:
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/pathlib.py", line 1241, in open
    return io.open(self, mode, buffering, encoding, errors, newline,
  File "/data/homes/atschan/.conda/envs/fractal-client-1.1.0a2/lib/python3.9/pathlib.py", line 1109, in _opener
    return self._accessor.open(self, flags, mode)
FileNotFoundError: [Errno 2] No such file or directory: '/data/homes/atschan/fractal-demos/examples/08_custom_cellpose_task/.cache/tasks'
jluethi commented 1 year ago

Also account for extras (see pytorch discussion)?

tcompa commented 1 year ago

This issue requires:

or an equivalent update to fractal-server.

Once such an update is available, the task list obtained from a API call would be a list of items with the following details:

  1. A flag encoding whether they are users' tasks or common tasks (e.g. through some_task["owner"] is None);
  2. If they are common tasks, a source attribute which is rich in metadata. The ones we would care about, for now, are the package name, the package version and the task name (or slug).

With this information, we should prepare a set of relevant use cases of how to address a task, e.g.:

Regarding this comment by @jluethi:

An example of why this is important: ... started to add custom tasks. The server accepts tasks with non-unique names. But then, the client-side task adding fails, because the caching can't happen client-side with the non-unique names.

Could we define the expected behavior? This is a custom task, and (in my understanding) we are not planning to add a version to it. How would we want the user to create multiple tasks with the same name and be able to specify which one they'd like to address? In principle we can also add an actual version to each task (custom or common), but then let's review https://github.com/fractal-analytics-platform/fractal-server/issues/702 (especially in its "Re: why a string and not many db columns?" section).

tcompa commented 1 year ago

Some updates from recent work on fractal-server:

Plan of changes:

Note: this requires #496.

New interface of task edit:

fractal task edit TASK_ID_OR_NAME --version VERSION [...]

New interface of workflow add-task:

fractal workflow add-task PROJECT_ID WORKFLOW_ID TASK_ID_OR_NAME --version VERSION [...]

Draft of the get_task_id_from_cache function (discussed with @ychiucco):

def _search_in_task_list(allow_cache_refresh)

    call _get_matching_tasks with NAME (required) and VERSION (optional)

    if 0 results (BAD THING):
        if allow_cache_refresh:
            task_list = refresh_task_cache
            return _search_in_task_list(..., allow_cache_refresh=False)
        else
            fail with very informative message

    if 1 result:
        return its ID

    if N>1 results:
        if version is None:
            set version = max(versions)
            check whether there is only one task left
            if yes:
                return its ID
            if not:
                HANDLE AS "BAD THING" -- SEE ABOVE
        else:
            # e.g. two "test" tasks, with version 0.0.1 but different `owner`s
            HANDLE AS "BAD THING" -- SEE ABOVE

----

get_task_id_from_cache(TASK_ID_OR_NAME, VERSION = None):
    if TASK_ID_OR_NAME is an integer:
        if VERSION:
            raise
        return ID
    else:
        cache_is_up_to_date = False
        if cache is not there:
            refresh task_list and write it to cache
            cache_is_up_to_date = True

        task_id = _search_in_task_list(allow_cache_refresh=True)
        return task_id
tcompa commented 1 year ago

Here are some possible error messages coming from requests which cannot be fulfilled (as in the current state of https://github.com/fractal-analytics-platform/fractal/pull/499):

There is no task with name "dummy0" in the following task list:
  ID, Name, Version, Owner, Source
  101, "dummy1", 1.0.1, None, a
  201, "dummy2", None, None, b
  202, "dummy2", 2.0.0, None, c
  301, "dummy3", 3.0.0, None, d
  302, "dummy3", 3.1.4, None, e
  401, "dummy4", 4.0.0, None, f
  402, "dummy4", 4.1.1, None, g
  401, "dummy4", 4.1.1, None, h

There is no task with (name, version)=("dummy1", 3.1.4) in the following task list:
  ID, Name, Version, Owner, Source
  101, "dummy1", 1.0.1, None, a
  201, "dummy2", None, None, b
  202, "dummy2", 2.0.0, None, c
  301, "dummy3", 3.0.0, None, d
  302, "dummy3", 3.1.4, None, e
  401, "dummy4", 4.0.0, None, f
  402, "dummy4", 4.1.1, None, g
  401, "dummy4", 4.1.1, None, h

Cannot determine the latest version in the following task list:
  ID, Name, Version, Owner, Source
  201, "dummy2", None, None, b
  202, "dummy2", 2.0.0, None, c
Please make your request more specific.

Multiple tasks with latest version (4.1.1) in the following task list:
  ID, Name, Version, Owner, Source
  402, "dummy4", 4.1.1, None, g
  401, "dummy4", 4.1.1, None, h
Please make your request more specific.

Multiple tasks with latest version (4.1.1) in the following task list:
  ID, Name, Version, Owner, Source
  402, "dummy4", 4.1.1, None, g
  401, "dummy4", 4.1.1, None, h
Please make your request more specific.

Multiple tasks with version 4.1.1 in the following task list:
  ID, Name, Version, Owner, Source
  402, "dummy4", 4.1.1, None, g
  401, "dummy4", 4.1.1, None, h
Please make your request more specific.
jluethi commented 1 year ago

Nice error messages, thanks @tcompa !