Open tcompa opened 2 weeks ago
Now mostly defined in #1831 and #1829.
Now mostly defined in #1831 and #1829.
In view of major updates defined in these issues (and especially in https://github.com/fractal-analytics-platform/fractal-server/issues/1831#issuecomment-2393700802), the preliminary plan is the following:
source
is provided, it is used as the single true information: either a task with that exact source exists or we fail.source
is not provided, then we enter the new-identity-based search for one matching task.The precise meaning of the new identity matching is based on https://github.com/fractal-analytics-platform/fractal-server/issues/1831#issuecomment-2393700802 and on https://github.com/fractal-analytics-platform/fractal-server/issues/1859. I'll later propose a precise priority-based definition in this issue - since we initially ignored the fact that we also may have to disambiguate similar tasks (ref https://github.com/fractal-analytics-platform/fractal-web/issues/582).
source
.source
ssource
(the ones created before the migration) and without source
(the newly-created ones).pkg_name
+ optional version
field) and a individual-task identity (name
).source
or a pkg_name+version+name.The following task specification is invalid, since it has both old and new identity information:
{
"source": "fractal-tasks-core-1.2.3-cellpose",
"pkg_name": "fractal-tasks-core",
"version": "1.2.3",
"name": "cellpose_segmentation"
}
The following task specification is invalid, since it lacks the task name:
{
"pkg_name": "fractal-tasks-core",
"version": "1.2.3"
}
The following spec is OK
{
"source": "fractal-tasks-core-1.2.3-cellpose"
}
and the backend will look for a task with this exact source
(which, for the record, can only have been created before the v2.7 update).
The following task is OK:
{
"pkg_name": "fractal-tasks-core",
"version": "1.2.3",
"name": "cellpose_segmentation"
}
and the backend will look for a task which matches these three properties exactly. There may be more than one such task - see "disambiguation" section below.
The following task is OK:
{
"pkg_name": "fractal-tasks-core",
"name": "cellpose_segmentation"
}
and the backend will look for a task which matches these three properties exactly (more on this below). There may be more than one such task - see "disambiguation" section below.
When given a source
, there may be either zero or one task matching.
When given pkg_name
+version
+name
, or pkg_name
+name
, there may be several matching tasks. Here is a description of how to proceed:
A query for tasks with pkg_name
+version
+name
may return up to 1+N results, where N is the number of user groups such that: (1) I am part of that group, and (2) the group owns one such task.
In order to pick a specific task, the priority logic is as follows:
task_group.user_id = my_user_id
) has highest priority.All
user group follows.ALTERNATIVE OPTION: we could also switch 1 and 2, and give "All" the highest priority.
When exporting a task that was collected automatically by fractal-server, it will always look like
{
"pkg_name": "fractal-tasks-core",
"version": "1.2.3",
"name": "cellpose_segmentation"
}
but what should the backend do when requested to import a task like
{
"pkg_name": "fractal-tasks-core",
"name": "cellpose_segmentation"
}
?
Some options:
version=None
, which means that we won't find any matching task in this case.version=None
as a dummy filter, and rather return the latest available version of the task. This means that by editing the JSON of an exported workflow we can make its import less deterministic but give it more chances of success on different instances (or on instances where some old tasks were later deleted or deactivated).The import-workflow operation works on a per-user basis. It can only succeed if all tasks that are identified are also accessible to the user. If not, it should fail (rather than providing a partially-non-accessible workflow).
active
tasksThe import-workflow operation does not include a check of whether imported tasks are active
. If I am importing an old workflow where I pinned all versions, and in the meanwhile those versions were deactivated on the fractal-server instance, then the imported workflow will appear with multiple warnings and I won't be able to run it as a job.
ALTERNATIVE OPTION: the import fails when no active
matching tasks are found.
Up to now, we discussed task-group identity in multiple v2.7 threads. In order to specify a single task, we also need something more granular, and the obvious candidate here is the task name
.
This introduces some additional requirements for the backend:
name
should be immutableMoving forward, there will be a point in the future where we should discontinue the "import-task-based-on-source" feature. This will invalidate all exported-workflow JSON files created with versions <2.7, thus clearly being a breaking change.
ALTERNATIVE OPTION: we could also switch 1 and 2, and give "All" the highest priority.
I think it's useful that "my personal task" has highest priority. It allows a user to recollect a task for themselves that would have issues in the ALL collection, thus working around some issues without needing admin support.
We consider version=None as a dummy filter, and rather return the latest available version of the task. This means that by editing the JSON of an exported workflow we can make its import less deterministic but give it more chances of success on different instances (or on instances where some old tasks were later deleted or deactivated).
I'd be in favor of that. If one imports a workflow with tasks that don't have their version specified, one gets the newest available versions of those tasks. That may or may not work (depending on whether parameters have changed). But given that we never export workflows that way, I think it's a good trade-off
The import-workflow operation does not include a check of whether imported tasks are active. If I am importing an old workflow where I pinned all versions, and in the meanwhile those versions were deactivated on the fractal-server instance, then the imported workflow will appear with multiple warnings and I won't be able to run it as a job.
As long as we can make those warnings visible to the user (e.g. also show feedback during import that some tasks were deactivated, they may need to update to a newer version), I think that's fine
Up to now, we discussed task-group identity in multiple v2.7 threads. In order to specify a single task, we also need something more granular, and the obvious candidate here is the task name. This introduces some additional requirements for the backend:
A task name should be immutable Upon task-group creation, we must verify that no two tasks with the same name exist in the same group.
Sounds good to me
Moving forward, there will be a point in the future where we should discontinue the "import-task-based-on-source" feature. This will invalidate all exported-workflow JSON files created with versions <2.7, thus clearly being a breaking change.
I'd be quite hesitant of dropping this, even once we don't have any of those tasks active anymore on the server. This breaks importing old workflows, so I'd much prefer to keep this active
Upon task-group creation, we must verify that no two tasks with the same name exist in the same group.
Unique constraint on TaskV2
taskgroup_id
and name
columns.
https://github.com/fractal-analytics-platform/fractal-server/issues/1829 should offer a way to move from the task source to the taskgroup attributes (plus a single task-specific attribute, e.g. its name).
I guess we should maintain