Update tasks to Fractal 2.0

fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform

https://fractal-analytics-platform.github.io/fractal-tasks-core/

BSD 3-Clause "New" or "Revised" License

14 stars 6 forks source link

Update tasks to Fractal 2.0 #671

Closed jluethi closed 5 months ago

jluethi commented 6 months ago

Closes #669 Closes #682 Closes #324 Closes #415 Closes #299 Closes #523 Closes #535 Closes #686 Closes #674 Closes #691

Discussion

To what degree are we giving deprecation warnings or just changing some of the fractal-tasks-core functions? In general, public functions should stay stable. But we probably have some functions that shouldn't be public functions. For example, utils.get_table_path_dict: I'm now updating it to follow the new behavior. But also: zarr_url vs. zarrurl (e.g. in pyramid building) and other functions. I would expect this to become the 1.0 release of fractal-tasks-core. We have some other libraries depending on it, but not too many yet. Going forward, we need to take care not to break any public functions. This may be a good time to make some of those functions private now though.

To review

[x] Using zarr_url instead of path (see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/670)
[ ] Task output for apply_registration_to_image:
- [ ] How are the image_list_updates structured?
- [ ] Do image_list_updates now always provide the new image types or are they still applied based on filters set in the manifest? Edited images: An image zarr_url is provided that is already in the image => start from previous properties, gets updated with the manifest-based new filters (=> new types) Should manifest types be applied to input zarr_urls by default even if no image_list_update output is provided?
[x] Task output in compound tasks: Can tasks provide custom Pydantic models as output? e.g. for the Cellvoyager converter: Can it produce InitArgsCellVoyager objects as output or just dicts that can be cast to that model in the input of the compute task? We can use the models to validate the dict we're building in the init task, but still output a dict in the init task Potential complexity argues for handling it in the task (=> more control to task developer). => We stay with dict output for init tasks (in init_args)

General recommendation: Create a dict, optionally validate it with a model

[ ] In the image list outputs, can tasks provide dictionary items with value None? [for output for cellvoyager converter: Can I always provide the acquisition as an attribute even if it's None?] => That scenario doesn't matter, it was just init_args. If compute task output contains None => unsetting would make sense. A task should be able to unset an attribute. Unsetting via providing "my_attr": None Currently is an error.

Handling on the task side: Avoid putting in None when (exclude_Nones, exclude_unset) => How do we handle the complexities in InitArgs? In general, try to keep InitArgs simple

In general, the less hidden behaviors the better! We need to document any hidden behaviors or ways of doing the same thing in 2 ways.

[ ] In the MIP task, I provide both a new type ("is_3D" = False) as well an origin that points to images with is_3D=True. Validate in the runner that the server then has is_3D=False once they ran

Cleanup

[ ] Can fractal_tasks_core.utils.get_parameters_from_metadata be deprecated?
[ ] Build pyramid uses zarrurl vs. zarr_url
[ ] Remove old create_ome_zarr & yokogawa_to_ome_zarr functions

Checklist before merging

[ ] I added an appropriate entry to CHANGELOG.md

github-actions[bot] commented 6 months ago

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
fractal_tasks_core
channels.py	$This PR doesn't change the number of statements in fractal_tasks_core/channels.py, which is 188.$	$This PR doesn't change the number of statements missing coverage in fractal_tasks_core/channels.py, which is 1.$	$This PR doesn't change the coverage rate in fractal_tasks_core/channels.py, which is 99.46% (187/188).$	$This PR does not seem to add statements to fractal_tasks_core/channels.py.$
utils.py	$This PR doesn't change the number of statements in fractal_tasks_core/utils.py, which is 68.$	$This PR adds 1 to the number of statements missing coverage in fractal_tasks_core/utils.py, taking it from 1 to 2.$	$This PR removes 1.47 percentage points from the coverage rate in fractal_tasks_core/utils.py, taking it from 98.52% (67/68) to 97.05% (66/68).$	$In this PR, 3 new statements are added to fractal_tasks_core/utils.py, 3 of which are covered (100%).$
fractal_tasks_core/ngff
specs.py	$This PR adds 23 to the number of statements in fractal_tasks_core/ngff/specs.py, taking it from 122 to 145.$	$This PR doesn't change the number of statements missing coverage in fractal_tasks_core/ngff/specs.py, which is 0.$	$This PR doesn't change the coverage rate in fractal_tasks_core/ngff/specs.py, which is 100% (145/145).$	$In this PR, 29 new statements are added to fractal_tasks_core/ngff/specs.py, 29 of which are covered (100%).$
zarr_utils.py	$This PR adds 14 to the number of statements in fractal_tasks_core/ngff/zarr_utils.py, taking it from 47 to 61.$	$This PR adds 7 to the number of statements missing coverage in fractal_tasks_core/ngff/zarr_utils.py, taking it from 3 to 10.$	$This PR removes 10.01 percentage points from the coverage rate in fractal_tasks_core/ngff/zarr_utils.py, taking it from 93.61% (44/47) to 83.6% (51/61).$	$In this PR, 14 new statements are added to fractal_tasks_core/ngff/zarr_utils.py, 7 of which are covered (50%).$	98-104, 108-113
fractal_tasks_core/tables
v1.py	$This PR adds 26 to the number of statements in fractal_tasks_core/tables/v1.py, taking it from 94 to 120.$	$This PR adds 6 to the number of statements missing coverage in fractal_tasks_core/tables/v1.py, taking it from 0 to 6.$	$This PR removes 5.00 percentage points from the coverage rate in fractal_tasks_core/tables/v1.py, taking it from 100% (94/94) to 95% (114/120).$	$In this PR, 26 new statements are added to fractal_tasks_core/tables/v1.py, 20 of which are covered (76.92%).$	290, 317-324
fractal_tasks_core/tasks
_registration_utils.py	$This PR adds 73 statements to fractal_tasks_core/tasks/_registration_utils.py. The file did not seem to exist on the base branch.$	$This PR adds 2 statements missing coverage to fractal_tasks_core/tasks/_registration_utils.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/tasks/_registration_utils.py is 97.26% (71/73). The file did not seem to exist on the base branch.$	$In this PR, 73 new statements are added to fractal_tasks_core/tasks/_registration_utils.py, 71 of which are covered (97.26%).$	62, 209
_utils.py	$This PR doesn't change the number of statements in fractal_tasks_core/tasks/_utils.py, which is 29.$	$This PR doesn't change the number of statements missing coverage in fractal_tasks_core/tasks/_utils.py, which is 5.$	$This PR doesn't change the coverage rate in fractal_tasks_core/tasks/_utils.py, which is 82.75% (24/29).$	$In this PR, 4 new statements are added to fractal_tasks_core/tasks/_utils.py, 4 of which are covered (100%).$
apply_registration_to_image.py	$This PR removes 6 from the number of statements in fractal_tasks_core/tasks/apply_registration_to_image.py, taking it from 119 to 113.$	$This PR removes 2 from the number of statements missing coverage in fractal_tasks_core/tasks/apply_registration_to_image.py, taking it from 18 to 16.$	$This PR adds 0.97 percentage points to the coverage rate in fractal_tasks_core/tasks/apply_registration_to_image.py, taking it from 84.87% (101/119) to 85.84% (97/113).$	$In this PR, 22 new statements are added to fractal_tasks_core/tasks/apply_registration_to_image.py, 21 of which are covered (95.45%).$	163
calculate_registration_image_based.py	$This PR removes 27 from the number of statements in fractal_tasks_core/tasks/calculate_registration_image_based.py, taking it from 98 to 71.$	$This PR doesn't change the number of statements missing coverage in fractal_tasks_core/tasks/calculate_registration_image_based.py, which is 6.$	$This PR removes 2.33 percentage points from the coverage rate in fractal_tasks_core/tasks/calculate_registration_image_based.py, taking it from 93.87% (92/98) to 91.54% (65/71).$	$In this PR, 11 new statements are added to fractal_tasks_core/tasks/calculate_registration_image_based.py, 11 of which are covered (100%).$
cellpose_segmentation.py	$This PR removes 8 from the number of statements in fractal_tasks_core/tasks/cellpose_segmentation.py, taking it from 210 to 202.$	$This PR removes 1 from the number of statements missing coverage in fractal_tasks_core/tasks/cellpose_segmentation.py, taking it from 24 to 23.$	$This PR adds 0.04 percentage points to the coverage rate in fractal_tasks_core/tasks/cellpose_segmentation.py, taking it from 88.57% (186/210) to 88.61% (179/202).$	$In this PR, 12 new statements are added to fractal_tasks_core/tasks/cellpose_segmentation.py, 11 of which are covered (91.66%).$	369
cellvoyager_to_ome_zarr_compute.py	$This PR adds 86 statements to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_compute.py. The file did not seem to exist on the base branch.$	$This PR adds 5 statements missing coverage to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_compute.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_compute.py is 94.18% (81/86). The file did not seem to exist on the base branch.$	$In this PR, 19 new statements are added to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_compute.py, 18 of which are covered (94.73%).$	208
cellvoyager_to_ome_zarr_init.py	$This PR adds 163 statements to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init.py. The file did not seem to exist on the base branch.$	$This PR adds 25 statements missing coverage to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init.py is 84.66% (138/163). The file did not seem to exist on the base branch.$	$In this PR, 21 new statements are added to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init.py, 21 of which are covered (100%).$
cellvoyager_to_ome_zarr_init_multiplex.py	$This PR adds 197 statements to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init_multiplex.py. The file did not seem to exist on the base branch.$	$This PR adds 20 statements missing coverage to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init_multiplex.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init_multiplex.py is 89.84% (177/197). The file did not seem to exist on the base branch.$	$In this PR, 30 new statements are added to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init_multiplex.py, 29 of which are covered (96.66%).$	139
copy_ome_zarr_hcs_plate.py	$This PR adds 108 statements to fractal_tasks_core/tasks/copy_ome_zarr_hcs_plate.py. The file did not seem to exist on the base branch.$	$This PR adds 3 statements missing coverage to fractal_tasks_core/tasks/copy_ome_zarr_hcs_plate.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/tasks/copy_ome_zarr_hcs_plate.py is 97.22% (105/108). The file did not seem to exist on the base branch.$	$In this PR, 108 new statements are added to fractal_tasks_core/tasks/copy_ome_zarr_hcs_plate.py, 105 of which are covered (97.22%).$	231, 295-297
find_registration_consensus.py	$This PR adds 48 statements to fractal_tasks_core/tasks/find_registration_consensus.py. The file did not seem to exist on the base branch.$	$This PR adds 4 statements missing coverage to fractal_tasks_core/tasks/find_registration_consensus.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/tasks/find_registration_consensus.py is 91.66% (44/48). The file did not seem to exist on the base branch.$	$In this PR, 48 new statements are added to fractal_tasks_core/tasks/find_registration_consensus.py, 44 of which are covered (91.66%).$	106, 118, 166-168
illumination_correction.py	$This PR removes 14 from the number of statements in fractal_tasks_core/tasks/illumination_correction.py, taking it from 105 to 91.$	$This PR removes 4 from the number of statements missing coverage in fractal_tasks_core/tasks/illumination_correction.py, taking it from 14 to 10.$	$This PR adds 2.34 percentage points to the coverage rate in fractal_tasks_core/tasks/illumination_correction.py, taking it from 86.66% (91/105) to 89.01% (81/91).$	$In this PR, 12 new statements are added to fractal_tasks_core/tasks/illumination_correction.py, 10 of which are covered (83.33%).$	146, 273
image_based_registration_hcs_init.py	$This PR adds 22 statements to fractal_tasks_core/tasks/image_based_registration_hcs_init.py. The file did not seem to exist on the base branch.$	$This PR adds 2 statements missing coverage to fractal_tasks_core/tasks/image_based_registration_hcs_init.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/tasks/image_based_registration_hcs_init.py is 90.9% (20/22). The file did not seem to exist on the base branch.$	$In this PR, 22 new statements are added to fractal_tasks_core/tasks/image_based_registration_hcs_init.py, 20 of which are covered (90.9%).$	92-94
import_ome_zarr.py	$This PR adds 4 to the number of statements in fractal_tasks_core/tasks/import_ome_zarr.py, taking it from 93 to 97.$	$This PR doesn't change the number of statements missing coverage in fractal_tasks_core/tasks/import_ome_zarr.py, which is 10.$	$This PR adds 0.44 percentage points to the coverage rate in fractal_tasks_core/tasks/import_ome_zarr.py, taking it from 89.24% (83/93) to 89.69% (87/97).$	$In this PR, 21 new statements are added to fractal_tasks_core/tasks/import_ome_zarr.py, 20 of which are covered (95.23%).$	212
init_group_by_well_for_multiplexing.py	$This PR adds 23 statements to fractal_tasks_core/tasks/init_group_by_well_for_multiplexing.py. The file did not seem to exist on the base branch.$	$This PR adds 3 statements missing coverage to fractal_tasks_core/tasks/init_group_by_well_for_multiplexing.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/tasks/init_group_by_well_for_multiplexing.py is 86.95% (20/23). The file did not seem to exist on the base branch.$	$In this PR, 23 new statements are added to fractal_tasks_core/tasks/init_group_by_well_for_multiplexing.py, 20 of which are covered (86.95%).$	61, 86-88
io_models.py	$This PR adds 52 statements to fractal_tasks_core/tasks/io_models.py. The file did not seem to exist on the base branch.$	$This PR adds 0 statements missing coverage to fractal_tasks_core/tasks/io_models.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/tasks/io_models.py is 100% (52/52). The file did not seem to exist on the base branch.$	$In this PR, 60 new statements are added to fractal_tasks_core/tasks/io_models.py, 60 of which are covered (100%).$
maximum_intensity_projection.py	$This PR adds 16 to the number of statements in fractal_tasks_core/tasks/maximum_intensity_projection.py, taking it from 48 to 64.$	$This PR adds 3 to the number of statements missing coverage in fractal_tasks_core/tasks/maximum_intensity_projection.py, taking it from 3 to 6.$	$This PR removes 3.12 percentage points from the coverage rate in fractal_tasks_core/tasks/maximum_intensity_projection.py, taking it from 93.75% (45/48) to 90.62% (58/64).$	$In this PR, 31 new statements are added to fractal_tasks_core/tasks/maximum_intensity_projection.py, 27 of which are covered (87.09%).$	149-162
napari_workflows_wrapper.py	$This PR removes 8 from the number of statements in fractal_tasks_core/tasks/napari_workflows_wrapper.py, taking it from 244 to 236.$	$This PR removes 1 from the number of statements missing coverage in fractal_tasks_core/tasks/napari_workflows_wrapper.py, taking it from 20 to 19.$	$This PR adds 0.15 percentage points to the coverage rate in fractal_tasks_core/tasks/napari_workflows_wrapper.py, taking it from 91.8% (224/244) to 91.94% (217/236).$	$In this PR, 8 new statements are added to fractal_tasks_core/tasks/napari_workflows_wrapper.py, 8 of which are covered (100%).$
Project Total

This report was generated by python-coverage-comment-action

jluethi commented 6 months ago

An interesting observation: Using the new zarr_url scheme, we can drop Path as an import from many of these tasks => promising for the direction of having cloud-compatible tasks! :)

jluethi commented 6 months ago

Should we rename the type "3D" to "is_3D" or something similar? To avoid having to switch to different dictionary approach

jluethi commented 6 months ago

I rewrote the Copy OME-Zarr task from scratch and changed the work distribution between copy & MIP parts => now more happens in the compute (MIP part). The copy only sets up the plate & well with the correct metadata. But in the new version, it's now explicitly specific for HCS plates, validates HCS metadata and writes the plate just with the wells that were passed with the zarr_urls (=> should open up for making this copy task an easy route for subset testing).

Having to handle the metadata of the OME-Zarrs lead to me introducing a few new Plate spec pydantic classes & copying over some of my helper functions from the navigator project (see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/678). Will be good to review which parts of those we want in Fractal-tasks-core.

One thing I'd like to review is whether this is the intended & simplest image_list_update we can provide:

    image_list_update_dict = dict(
        image_list_updates=[
            dict(
                zarr_url=zarr_url,
                origin=init_args.origin_url,
                types=dict(
                    is_3D=False,
                ),
            )
        ]
    )

Having to provide a dict of a list of a dict seems kind of cumbersome. I understand why it needs to be a list of dicts, as some tasks will provide updates to multiple images. Having this list in a dict with the only key image_list_updates is there so we know whether a task provided an image_list_updates or a parallelization_list I assume?

jluethi commented 6 months ago

@tcompa I need to build the new manifest tomorrow. Besides that, I think the tasks-core package would be mostly ready for examples 01 & 02.

Importing & multiplexing registration are still TBD, as well as a long list of tests & points above of course :)

jluethi commented 6 months ago

@tcompa I now added the draft task list. Maybe I'll find some time this afternoon to work on import & registration some more. Otherwise, this should be ready for testing manifest building & then also for running example 01 as far as I can say. (I'm sure we'll discover some bugs in the process, but the tasks follow the new API and the tests are updated for them at least)

jluethi commented 5 months ago

Closes #674

jluethi commented 5 months ago

I made the Find Registration Consensus a compound task now to provide an example of parallelizing over a well, which may be needed in some of Adrian's tasks quite often

jluethi commented 5 months ago

@tcompa I have an initial version of all the tasks ready now. The MIP task uses the old approach to get attributes again. The tests are passing, even one that runs cellpose after MIP, but I'll be curious to see if example 01 runs. I'll try to test that tomorrow morning :)

There are a few remaining parts on activating testing functionality, could you have a look at those?

[x] Update & reenable manifest validation in Github CI
[x] Update & reenable test_valid_args_schemas
[x] Update test_task_interface in test_valid_task_interface once new Manifest is built

Other than that, the updates for manifest creation to make it slightly more flexible can either be part of this PR or a future PR. And I've create new issues for all the future goals in #669 .

tcompa commented 5 months ago

There are a few remaining parts on activating testing functionality, could you have a look at those?

Update & reenable manifest validation in Github CI Update & reenable test_valid_args_schemas Update test_task_interface in test_valid_task_interface once new Manifest is built

This are now covered, as of #689.

tcompa commented 5 months ago

I'll be curious to see if example 01 runs. I'll try to test that tomorrow morning

The cellpose task now runs:

Note that this test is currently a bit cumbersome, until we merge into main and make a pre-release.

One way is to use fractal-containers.

Another way is to run poetry build from the root directory of the fractal-tasks-core repo (within the V2_tasks branch), so that a wheel file is generated in the dist folder (and can be used for local-wheel task collection in fractal-web/server).

tcompa commented 5 months ago

Also napari-workflows runs, in fractal-demos (I did not check the output):

jluethi commented 5 months ago

Awesome!

Another way is to run poetry build from the root directory of the fractal-tasks-core repo (within the V2_tasks branch), so that a wheel file is generated in the dist folder (and can be used for local-wheel task collection in fractal-web/server).

Yes, also just setting up tests locally with server 2.0.0a4 and building the task package locally with python -m build :)

jluethi commented 5 months ago

I can confirm that I can also run example V1! =D

From my side, ready to merge after your review then. And I'll start exploring other examples and will open new issues where they occur :)

tcompa commented 5 months ago

Note: I pushed 34faa92e35f717a4a0060a2192ed31b201676125, with these changes:

Improve plate-related ngff models:

* Add `AcquisitionInPlate.description`;
* Make `AcquisitionInPlate.id` required;
* Add `ColumnInPlate`;
* Add `RowInPlate`.

tcompa commented 5 months ago

While reviewing the copy_ome_zarr_hcs_plate task, some parts (like the metadata generation) were a bit hard to understand. As a minor fix, I started using the load helper functions (including a new load_NgffPlateMeta one) and added a few comments - see this commit .

In the future this task would benefit from yet another review.

jluethi commented 5 months ago

Thanks a lot for the review & the fixes @tcompa !