fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
14 stars 6 forks source link

Provide fractal-compatible package template #476

Closed tcompa closed 10 months ago

tcompa commented 1 year ago

(e.g. using https://github.com/pydev-guide/pyrepo-copier)

jluethi commented 12 months ago

We recently created a new package with Fractal tasks using this approach here: https://github.com/MaksHess/abbott

jluethi commented 11 months ago

Options:

Qs: How do we get template updates back into old packages? How easy is it for a user to do all the required setup (todo list vs. copier questions)

tcompa commented 11 months ago

Briefly, from the meeting this morning.

The main goal is scaffolding: get a user to adapt their scripts into a Fractal-compatible package as smoothly as possible. A (lower-priority) second goal is life-cycle management, which includes pulling templates updates into the project.

Options (see a partial comparison here):

Requirements:

  1. A "dummy" example task, in src/example_task.py
  2. Tests on arguments JSON Schemas (ported from fractal-tasks-core)
  3. Package building tool (https://pypa-build.readthedocs.io) and configuration, possibly only requiring the user to type python -m build or some similar command.
    • This must be based on a pyproject.toml file, which includes the fractal-tasks-core dependency by default
    • This must produce a wheel file that can be imported in fractal-server
    • The manifest file must be part of the wheel file
  4. Info on how to install locally (basically python -m venv myenv; source myenv/bin/activate; pip install -e .)
  5. Info on how to customize the scaffold (this will depend on the chosen template format).

Non urgent goals:

  1. Publishing to PyPI
  2. Linters, formatters, pre-commit

Non goals (for the moment):

  1. Publishing a conda package
  2. Docker
  3. Development environment handling (e.g. poetry)
tcompa commented 10 months ago

Here is the first version of the template: https://github.com/exactlab/fractal-tasks-template. I will describe some higher-level choices here.

Which scaffolding approach?

Plain template repository

As a first option, we went with a simple template repository, not based on jinja or similar tools.

PROs:

  1. The repository is simpler, and more readable for someone just giving a quick look.
  2. Developing the template is easier, since it does not require to always switch between the template and an actual instance.

CONs:

  1. It is not straightforward to separate the customization operation from git operations. Actually even the very first operation (forking and cloning) immediately requires git concepts.
  2. The template inherits several tools from fractal-tasks-core, but the code is duplicated. This leads to a maintenance overhead because we may have to replicate changes in both repositories [note: this is shared by all options].

Point 3 is what made us look elsewhere. Writing a script that automatically makes a bunch of string replacements in a folder tree is easy, but the fact that it needs to invoke git mv or git add is cumbersome.

Copier

The second try is the one in https://github.com/exactlab/fractal-tasks-template, and it's based on copier.

PROs

  1. The largest part of a typical use case can be achieved without ever explicitly calling a git command. There is obviously some git in the copier copy gh:... command, but it's abstracted by copier. And if you don't plan to ever run copier update, then everything works without ever typing git init.

CONs:

  1. We are adding one more layer of things that can go wrong in not-so-transparent ways, as we found out a few times during development.
  2. copier is one more "dependency" that can introduce breaking changes in future releases, and the template may require copier-related maintenance activity.
  3. The template inherits several tools from fractal-tasks-core, but the code is duplicated. This leads to a maintenance overhead because we may have to replicate changes in both repositories [note: this is shared by all options].

Point 9 is only fixed by making fractal-tasks-core an instance of fractal-tasks-template, which is not an option for the moment.

Points 7 and 8 above can be mitigated e.g. by:

Status of the current fractal-tasks-template copier template

The current fractal-tasks-template is a fork of https://github.com/pydev-guide/pyrepo-copier. Due to points 7 and 8 above, I tried to only keep a very slim skeleton of the template, meaning that the fork already diverged quite a lot from upstream.

TBD: should this remain a fork, or should it become an independent repository (while still acknowledging the original pyrepo-copier effort)? We can probably discuss it directly with the pyrepo-copier maintainer. If it's OK, I'd rather move to an independent repository.

A first diff between upstream and our repo is here, but then I kept cleaning up as much as possible. Some relevant changes:

TODO: the use of GitHub actions and/or code-quality tools should be encouraged, and maybe we should add specific recommendations in the README pointing to valid examples (https://pydev-guide.github.io itself, or also other resources like https://learn.scientific-python.org/development/tutorials/ or https://scikit-hep.org/developer) or directly to the specific tools' documentation.

Fractal-specific aspects

These include:

Note that (as in point 4/9 above) any item of this list is introduce some debt, since it will need to remain in-sync with the rest of Fractal:

TL;DR

The current version can still be improved, and most importantly it lacks a full end-to-end real-life test (that is, actually using the tasks within fractal); but we are at a stage where feedback and reviews are already valuable.

tcompa commented 10 months ago

The first version is already available at https://github.com/fractal-analytics-platform/fractal-tasks-template