Closed jluethi closed 1 year ago
The coverage rate went from 89.94%
to 89.91%
:arrow_down:
The branch rate is 83%
.
92.92%
of new lines are covered.
While developing the Apply registration to ROI tables task
task, I now noticed the following:
Having a reset-origin removes the ability for using ROI tables to shift cycles vs. each other without modifying the data. Because even if we correctly modify the ROI starting position to shift it by the necessary amounts, reset-origin then gets around this again.
See issue here: https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/339
I'll think about it some more whether in that context, the complexity of having non-0 origins be handled via these resets is still the correct decision.
One way around this for the time being: We reset origins of the ROI tables before applying the translations. I'll investigate this as a solution while we consider options in #339
I added a first version of applying the registration to ROI tables now. Needs some polish (some doc building is failing). I also have some tests that need to be polished and added. Also need some extra checks on whether certain alignment scenarios could produce negative ROI coordinates (when it should now be 0-based) and figure out how to fix that best.
Next steps:
I now tested all 3 new tasks successfully. They are run in the following order: Calculate 2D registration (image-based): image-level (is meaningless on reference cycle on itself) Apply Registration to ROI Tables: well-level Apply Registration to Image: image-level
Here are some aligned outputs vs. non aligned inputs: Inputs:
Outputs:
It still works on some dummy 3D datasets => algorithms handle 2D & 3D data. Not sure how well the Calculate 2D registration (image-based)
task will perform both memory-wise and on finding correct transformation on more complex 3D registration problems though. Also, I'm using a np.squeeze to get it to 2D, so may just be an accident that on those dimension, it still found relevant registration. But at least the downstream logic of the tasks work with 3D.
Here's an example workflow based on example 03. I think it will make sense to update example 03 with this once we have this task released. workflow-export-Workflow multiplex-3-1692632697640.txt
What remains to be done:
Apply Registration to Image
Future improvements
Calculate 2D registration
With https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/487/commits/a1527ae8d2331e3d77c2d1756efd3553dd6d762d, I added tests/tasks/test_registration.py
(@jluethi, feel free to rename/refactor as you prefer).
By modifying
def test_multiplexing_registration(
zenodo_images_multiplex_shifted: list[str],
):
debug(zenodo_images_multiplex_shifted)
you should be able to write a test workflow that creates an OME-Zarr and then calls any of the registration-related tasks.
The zenodo_images_multiplex_shifted
fixture only runs if the tests/data/fake_multiplex_shifted
folder does not exist (and it takes about 6 seconds to run), and what it does is to (1) copy the tests/data/fake_multiplex
folder, (2) modify all four images of cycle2.
New cycle2 images look like this:
and they are prepared in _shift_image
(which is also the function to modify if we want e.g. to replace noise with zeros, or if we want to change the XY shifts).
NOTE: I tend to think that it is correct to set mode="I"
in Image.fromarray(...)
, but I've not tested that the images are then loaded correctly when converting to zarr. I only checked that the file sizes are reasonably similar to the original ones, and that img.show()
(in Pillow) shows what I expect. If anything weird happens during zarr-conversion, this may be the actual issue.
Thanks Tommaso, will try to set up some tests with this.
Also, note to myself:
They mostly seem to run as expected, but sometimes Apply Registration to Image
runs into the following error:
TASK ERROR:Task id: 7 (Apply Registration to Image), e.workflow_task_order=6
TRACEBACK:
2023-08-23 14:11:24,452; INFO; START apply_registration_to_image task
Traceback (most recent call last):
File "/Users/joel/Library/CloudStorage/Dropbox/Joel/BioVisionCenter/Code/fractal/fractal-demos/examples/server/FRACTAL_TASKS_DIR/.fractal/fractal_tasks_core0.10.0/venv/lib/python3.9/site-packages/fractal_tasks_core/tasks/apply_registration_to_image.py", line 353, in <module>
run_fractal_task(
File "/Users/joel/Library/CloudStorage/Dropbox/Joel/BioVisionCenter/Code/fractal/fractal-demos/examples/server/FRACTAL_TASKS_DIR/.fractal/fractal_tasks_core0.10.0/venv/lib/python3.9/site-packages/fractal_tasks_core/tasks/_utils.py", line 79, in run_fractal_task
metadata_update = task_function(**pars)
File "pydantic/decorator.py", line 40, in pydantic.decorator.validate_arguments.validate.wrapper_function
from contextlib import _GeneratorContextManager
File "pydantic/decorator.py", line 134, in pydantic.decorator.ValidatedFunction.call
File "pydantic/decorator.py", line 206, in pydantic.decorator.ValidatedFunction.execute
File "/Users/joel/Library/CloudStorage/Dropbox/Joel/BioVisionCenter/Code/fractal/fractal-demos/examples/server/FRACTAL_TASKS_DIR/.fractal/fractal_tasks_core0.10.0/venv/lib/python3.9/site-packages/fractal_tasks_core/tasks/apply_registration_to_image.py", line 200, in apply_registration_to_image
old_table_group = zarr.open_group(
File "/Users/joel/Library/CloudStorage/Dropbox/Joel/BioVisionCenter/Code/fractal/fractal-demos/examples/server/FRACTAL_TASKS_DIR/.fractal/fractal_tasks_core0.10.0/venv/lib/python3.9/site-packages/zarr/hierarchy.py", line 1532, in open_group
raise GroupNotFoundError(path)
zarr.errors.GroupNotFoundError: group not found at path ''
The error occured for cycle 1 & 2 (but not 0) of one well, but not the other well.
@tcompa I now added tests based on your data provider. Thanks a lot, looks to be working smoothly! :)
Github doesn't run the CI anymore on the PR though. Do we need to merge conflicts for that to trigger again?
Remaining things to test in tasks:
Apply registration to image
I figured out what triggered the error above: I was storing the output locally to the Dropbox folder. This one uses a new FileProvider API. Apparently, that one gets confused if I delete a folder (the output folder) and then recreate it before the deletion has synced to the server 🙈 All running as intended now, with some added logs
Github doesn't run the CI anymore on the PR though. Do we need to merge conflicts for that to trigger again?
Yes, the CI was block due to a conflict. I fixed it.
@tcompa I changed the padding to use a constant value. Otherwise, the shift that the registration calculates is not always 100% robust, but can vary slightly. With the constant value, it looks to be more robust.
Also, I added cellpose to that test workflow, because I need to test that label images are copied over by the apply_registration_to_image
task. Unfortunately, this adds over a minute to the test runtime. Do we have good ways of just adding mock label images to Zarrs that we use in any other tests? It seems to run now, but not the most efficient.
Do we have good ways of just adding mock label images to Zarrs that we use in any other tests? It seems to run now, but not the most efficient.
You can copy what I did in test_workflows_cellpose_segmentation.py
.
First, define a function like patched_segment_ROI
, which produces a fake label array. Then you should include monkeypatch
in the test function parameters, and use it like
monkeypatch.setattr(
"fractal_tasks_core.tasks.cellpose_segmentation.segment_ROI",
patched_segment_ROI,
)
In this way, the segment_ROI
function of cellpose_segmentation.py
is (temporarily) replaced by the patched one, which runs much faster (and is also fully reproducible).
If you need more realistic labels, the other option is to increase level
.
Thanks, this monkeypatch approach works great and is very reproducible. It also reduced runtime a lot, even compared to level 4 cellpose.
I have a side observation I'll briefly document here, not related to the PR and not a Fractal bug as far as I can tell, but confused me a bit when analyzing the test data: In the current napari version I was using (0.5.0a2.dev268+g4ba0e6ac), the intensity image and the labels aren't a perfect overlap when viewing them in napari, if the two are in a different resolution. It isn't that noticeable for level 0 image vs. level 1 label:
But it stands out when using level 4 labels:
The actual shapes are correct, it just looks like the display is shifted by one or a few pixels when displaying low-res labels on high-res images. It's unrelated to whether registration is applied or not, I think a more general napari thing. But I went digging a bit until I realized our arrays were all fine.
@tcompa The PR would now be ready for review. I think it covers the basic functionality we need.
Some ideas for future improvements
Calculate 2D registration
: Not reducing to a 2D image anymore, maybe using SimpleITK with Max in an advanced version of this taskPolishing:
- Unit tests for helper functions
I'm now marking this as done.
As of https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/487/commits/a912c65515156fee51ebb0b327526f4e3e95edc9, the coverage of lib_regions_of_interest.py
is 91%, when based only on test_unit_ROIs.py
(that is, without including additional coverage that may come from other tests).
All the functions introduced/modified by this PR (is_ROI_table_valid
, are_ROI_table_columns_valid
, convert_indices_to_regions
, reset_origin
) are 100%-covered by unit tests.
Polishing:
- Unit tests for helper functions
I'm now marking this as done.
As of a912c65, the coverage of
lib_regions_of_interest.py
is 91%, when based only ontest_unit_ROIs.py
(that is, without including additional coverage that may come from other tests). All the functions introduced/modified by this PR (is_ROI_table_valid
,are_ROI_table_columns_valid
,convert_indices_to_regions
,reset_origin
) are 100%-covered by unit tests.
I just realized that some helper functions are part of the task Python modules, and I was not including these ones in tests/coverage. Thus I unmarked the unit-test requirement, as I need to dig a bit deeper.
Question (@jluethi): is it worth moving some of these functions to a lib_registration.py
(or similar) module in the core library? Or is each function only relevant for its own task?
Self-reminder for something else I should review (from https://github.com/fractal-analytics-platform/fractal-tasks-core/actions/runs/6038282653/job/16384333401?pr=487):
tests/tasks/test_registration.py::test_multiplexing_registration
/home/runner/work/fractal-tasks-core/fractal-tasks-core/fractal_tasks_core/tasks/apply_registration_to_ROI_table.py:235: RuntimeWarning: invalid value encountered in maximum
np.maximum(max_df.values, table.values),
I now went through the entire PR once, and it mostly looks good to me!
My current changes can be reviewed with git diff c653f2ed099cd2916ee48cf2b1277413d77c60d6
, and they mostly focus on:
Here are some first remarks; it'd be nice if @jluethi can chime (either through commits or through comments that I can implement).
Where should helper functions be? They are currently within the task modules, but some of them look quite general (e.g. get_acquisition_paths
, currently in apply_registration_to_ROI_tables.py
).
Moving these functions to the core library has some pro's: we can make sure that they are always properly tested, and we can easily make them more consistent with the other core-library functions.
In some cases, however, moving them to the core library would require a lot of effort towards generalization, which is not worth it (e.g. for write_registered_zarr
, which is quite specifically related to the way it's used in the apply_registration_to_image.py
task).
I propose to move at least some of the functions, e.g. get_table_path_dict
and get_acquisition_paths
.
As per https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/487#issue-1845308806, I am expecting the calculate-registration task to only consider 2D images. Where can I see that it isn't performing 3D registration? The np.squeeze
functions called on the phase_cross_correlation
arguments are not meant to remove non-dummy dimensions. I'm missing something here.
I found the names of apply_registration_to_roi_table
(helper function) and apply_registration_to_ROI_table
(task function) very confusing. I now renamed them to apply_registration_to_single_ROI_table
(helper function) and apply_registration_to_ROI_tables
(task function), but I'm happy of any other version that keeps them reasonably different.
It'd be nice to always specify the parallelization level somewhere in the task docstring (on top of having it in the manifest).
Minor issue in apply_registration_to_ROI_tables.py
: https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/487/files#r1312686801.
Minor issue in calculate_2D_registration_image_based.py
: https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/487/files#r1311606441.
Minor issue in get_acquisition_paths
: https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/487/files/043e63d7bb4cca421a0d978d6328c53e0b9208e8#r1312680204.
Other (even less relevant) code comments here in the PR.
I added a couple of FIXMEs for missing docstrings (I included the ones that were obvious, but some were less trivial) or other small changes.
I'll be mostly waiting for feedback on this PR, but later I still want to review a few issues:
write_registered_zarr
).Thanks a lot for the review @tcompa !
I addressed the smaller comments. On the larger points:
- Where should helper functions be?
I moved get_acquisition_paths
, get_table_path_dict
and is_standard_roi_table
to lib_zarr_utils
(first 2, as they are generic for Zarrs) and lib_regions_of_interest
(as it's ROI specific). It makes sense to have them in the core library (and refactor as needed to have them there). Feel free to move them to another library file if that fits better.
My impression of the other helper function is that they'd be too specific to the task, thus I would not get them into the core library, at least not as of know.
Question (@jluethi): is it worth moving some of these functions to a lib_registration.py (or similar) module in the core library? Or is each function only relevant for its own task?
I think the other helper functions are task specific. I've not had to import any in another task so far. But happy to move them once that would come up.
- As per https://github.com/fractal-analytics-platform/fractal-tasks-core/pull/487#issue-1845308806, I am expecting the calculate-registration task to only consider 2D images. Where can I see that it isn't performing 3D registration? The np.squeeze functions called on the phase_cross_correlation arguments are not meant to remove non-dummy dimensions. I'm missing something here.
Ah, true! Yes, everything is written with 3D in mind otherwise and I've tested it with some 3D images (our tiny test set), thus it should be working for 3D in general. But forgot that squeeze works that way and thought it may be running in 2D mode because of that. But turns out, it's not a 2D specific implementation. I wouldn't have the highest confidence that this algorithm for 3D registration will be too robust in complex cases, but I've now updated the task name to represent that it's not 2D limited.
- I found the names of apply_registration_to_roi_table (helper function) and apply_registration_to_ROI_table (task function) very confusing.
Great, thanks for renaming!
- It'd be nice to always specify the parallelization level somewhere in the task docstring (on top of having it in the manifest).
Good idea! I think such information would fit well in the main body of the docstrings. As well as other limitations like e.g. only processing 2D or 3D data or similar things.
Does an entry like Parallelization level: image
work? It seems to get parsed correctly into the doc_info.
5-7: Minor issues
I can't make sense of those links. Were they something tagged as FIXME? Or what were the issues?
- I added a couple of FIXMEs
I addressed all the once I found. I think this PR contains no more newly introduced FIXMEs.
While fixing some FIXMEs, I changed the handling for axes and added a get_axes_names
to the main library. That should make it more robust to figure out which case we're handling (e.g. zyx vs. cyx for 3D arrays). I plan to use this more to implement support for 2D registration from the MD once I have some test data (probably after this PR is merged), see https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/506
Reviewing the origin-reset strategy
Would be great to check this. In my testing, it was robust, i.e. would reset the origin before saving it, thus making it 0, 0, 0 origins for the registered tables. That may be a good strategy in general and then all our origins become simpler.
Reviewing whether we now have sufficient testing of helper functions (I think so, apart from some very task-specific ones like write_registered_zarr).
Would be great if you could have a look whether coverage is ok now. I moved some tests to the main tests (when the functions were moved) and added a few more small tests for the new functions I added.
write_registered_zarr
gets some basic testing through the test of the task and given that it's really part of how the task runs, maybe that's enough for the moment?
In summary: I think the issues were addressed now and the PR would be ready to merge after another look at the origin-reset strategy and maybe points 5-7 above. Future work on better MD support will come later.
In summary: I think the issues were addressed now and the PR would be ready to merge after another look at the origin-reset strategy and maybe points 5-7 above.
Great, thanks for all the updates!
Points 5-7 were links to some PR-code comments, which apparently are not robust when they get "resolved". Anyway, they are now all fixed.
I'll review origin&coverage and then this is ready also on my side.
As discussed this morning with @jluethi:
I'm starting the work on a registration task with this PR. See overview here: https://github.com/fractal-analytics-platform/fractal-tasks-core/issues/39
The goal is to have basic tasks that perform the necessary functionality in fractal-tasks-core and then have more complex tasks with heavier dependencies in other packages (some will fit well into scmultiplex, maybe some 3D image-based registration task will be its own package).
The registration will be split up into multiple tasks. The following parts are in scope for this first PR:
1. Calculate registration
Tasks that calculate the transformation needed for registration and store this information in the OME-Zarr. This runs for each cycle of each well (parallelization on the image level)
2. Applying registration to ROI tables
One way we can use registration is by adapting the ROI tables. That way, we don't need to change any of the image data, but all our ROI-based loading and ROI-based visualization should work well with it. Needs to read in tables from multiple cycles, thus runs on a per-well level
3. Applying registration to images
Actually changing the image data based on the transformation that were calculated. The benefits of doing this separately from step 1:
Downside is having to load some image data twice.
Out of scope for this PR, but worth investigating: Finding a better way to store transformation in the OME-Zarr either in line with the upcoming transformation spec for OME-NGFF (still a bit underdefined and still developing) or by adopting the same method as the SpatialData people are using (see e.g. https://spatialdata.scverse.org/en/latest/design_doc.html) [that should eventually make its way into OME-NGFF. I couldn't figure out how they save it to the on-disk OME-Zarr for now, but will investigate more]
TODOs:
Polishing:
Open questions: Interaction with 2D to 3D workflows: If we align 2D, does that somehow link back to 3D data? Gets tricky! Certainly not in initial version