fractal-analytics-platform / fractal-web

Web client for Fractal
https://fractal-analytics-platform.github.io/fractal-web/
BSD 3-Clause "New" or "Revised" License
6 stars 0 forks source link

Expose new endpoint for custom-environment task collection #517

Closed tcompa closed 2 months ago

tcompa commented 3 months ago

This is fully preliminary, as the feature is not yet ready in fractal-server. PR:

When ready, we'll also have to add a warning message:

Collecting tasks with a custom Python environment will use that environment for running the tasks. Be careful about changing this environment, as that may break existing workflows. It is recommended to use custom Python environments only during task development or when something needed for your environment building isn't supported in Fractal server yet. Collect the task with regular Fractal task collection for production setups.

tcompa commented 3 months ago

This is taking shape in https://github.com/fractal-analytics-platform/fractal-server/pull/1607, to be soon included in a (pre-)release. Note that it's a first version, and things may change in the future (especially related to the package_root/package_name interface).


In the "tasks" page (for v2 only), we should include a new tab (next to `PyPI, Local, Single task), with name TBD (let's start with "Custom Python env", but maybe we can find a better one).


When selecting this tab, we should display the following warning message (either fully, or in a way that can be opened/collapsed):

Collecting tasks with a custom Python environment will use that environment for running the tasks. Be careful about changing this environment, as that may break existing workflows. It is recommended to use custom Python environments only during task development or when something needed for your environment building isn't supported in Fractal server yet. Collect the task with regular Fractal task collection for production setups.


The form will include the following fields.

Request-body property: python_interpreter Type: required string (this will be possibly long, since it's an absolute path) Displayed name: Python Intepreter Help message: Absolute path to the Python interpreter to be used for running tasks.

Request-body property: source: Type: Required string (not necessarily very long) Displayed name: Source Help message: A common label identifying this package.

Request-body property: manifest Type: The request-body property is an object, but here we should let the user upload or drag'n'drop an on-disk JSON file. This is required. Displayed name: Manifest Help message: Manifest of a Fractal task package (this is typically the content of __FRACTAL_MANIFEST__.json).

Request-body property: version Type: Optional string Displayed name: Version Help message: Optional version of tasks to be collected.

Request-body property: package_name Type: Optional string Displayed name: Package Name Help message: Name of the package, as used in import <package_name>; this is then used to extract the package directory (package_root) via pip show <package_name>.

Request-body property: package_root Type: Optional string (potentially long, since it's an absolute path) Displayed name: Package Folder Help message: The folder where the package is installed. If not provided, it will be extracted via pip show (requires package_name to be set).


The button Collect will then make an API call to /api/v2/task/collect/custom/ (precise URL TBD). This backend endpoint processes things directly (that is, it doesn't run any background task), and returns after completion. Since it doesn't return immediately, perhaps we should handle it as we do for "slow" endpoints elsewhere in fractal-web (e.g. with a spinner?).

Success is a 201 response, while there are also a few known 422 branches.

After the endpoint returns, and in case it was successful, the task list in the DB has been updated. If we have an internal logic in the tasks page to decide when to refresh the task-list via an API call, then a successful response of this endpoint should trigger this logic.

jluethi commented 3 months ago

For the source help message, let's include an example. e.g.: Help message: A common label identifying this package. For example: pip_remote:fractal_tasks_core:1.1.0:fractal-tasks:py39:cellpose_segmentation

(how is the Python now formatted in the new source strings with required Python?)

For manifest, let's use the same approach as on the sandbox page when providing a manifest

A question here for @tcompa : How does the manifest specify the path to the relevant executables per task? Given that it's now not anymore in the folder with the package, but provided separately here and the functions are just available in the environment. Is this via package_root? If so, doesn't this need to be a required property? Or can it be package_root or package_name?

tcompa commented 3 months ago

[BACKEND-RELATED DISCUSSION]

For the source help message, let's include an example. e.g.: Help message: A common label identifying this package. For example: pip_remote:fractal_tasks_core:1.1.0:fractal-tasks:py39:cellpose_segmentation

In my view, these are user's tasks, rather than common tasks - meaning that they are not meant to adopt a standard source syntax. This means that if the user username triggers collection of mypackage and sets source to xxxxx, then source of their cellpose_segmentation task (in the current version) would look like username:xxxxx:cellpose_segmentation

The main goals of making a task common are:

  1. Portability across instances -> not relevant for custom Python environments
  2. Task-list sorting, where tasks from the same package are displayed together -> I'd need to review this, but I guess we could introduce some fail-safe version of it also for users' tasks.
  3. Different access control e.g. for editing tasks -> I don't think this applies to the current case, where I expect the task owner to be the user who ran the collection.

That said, we can still make a suggestion for a source that is close to the common-tasks one, e.g. for package mypackage we would suggest mypackage:1.1.0 (although writing the version in a free-text attribute is risky if we don't enforce a strict rule on the allowed syntax, since a user could mark the task as version="2.0.0" and then set the source to `mypackage:1.0.0").

tcompa commented 3 months ago

[BACKEND RELATED DISCUSSION]

A question here for @tcompa : How does the manifest specify the path to the relevant executables per task? Given that it's now not anymore in the folder with the package, but provided separately here and the functions are just available in the environment. Is this via package_root? If so, doesn't this need to be a required property? Or can it be package_root or package_name?

The package is meant to be available within the python interpreter that the user provides, and then we can discover its path automatically (given the python interpreter and the package name). This option 4C that we opted for in https://github.com/fractal-analytics-platform/fractal-server/issues/1581#issuecomment-2191499630. The fallback is 4A, where the user provides package_root explicitly.

The manifest structure is the same as it is in any other package, with relative module paths as in

      "executable_non_parallel": "tasks/cellvoyager_to_ome_zarr_init.py",
      "executable_parallel": "tasks/cellvoyager_to_ome_zarr_compute.py",
tcompa commented 2 months ago

The new endpoint is now available as of fractal-server=2.3.0a0.