fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
14 stars 6 forks source link

675 well metadata updates #710

Closed jluethi closed 5 months ago

jluethi commented 5 months ago

Closes #675

Currently fails tests due to #707 . Will also need to address #707 to handle this.

Based on changes in #709

Checklist before merging

github-actions[bot] commented 5 months ago

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  fractal_tasks_core/ngff
  specs.py
  fractal_tasks_core/tasks
  _zarr_utils.py
  apply_registration_to_image.py
Project Total  

This report was generated by python-coverage-comment-action

jluethi commented 5 months ago

I now added a heuristic that handles finding the correct reference zarr url. Plus the apply registration to image now correctly sets the well metadata using our new _update_well_metadata helper function.

jluethi commented 5 months ago

@tcompa Whenever you have a chance to review, it's ready from my side

tcompa commented 5 months ago

The current logic of _get_matching_ref_cycle_path_heuristic doesn't work, as two branches out of four (the second and third) look unreachable. Let's review the expected behavior:

HEURISTIC_CASES = [
    (["0", "0_illum_corr"], "1_illum_corr", "0_illum_corr"),  # this works, in the current version
    (["0", "0_illum_corr"], "1", "0"),  # this works, in the current version
    (["0", "0_illum_corr", "0_registered"], "1_illum_corr", "0_illum_corr"),  # this is identical to the first line, in the current version
    (["0", "1", "2", "3"], "0_cycle2", "0"),  # this works, in the current version
    (["0", "1", "2", "3"], "1_cycle2", "1"),  # I am adding this one now, but it would fail in the current version
]

@pytest.mark.parametrize("path_list, path, expected_match", HEURISTIC_CASES)
def test_get_matching_ref_cycle_path_heuristic(
    path_list: list[str], path: str, expected_match: str
):
    match = _get_matching_ref_cycle_path_heuristic(
        path_list=path_list, path=path
    )
    assert match == expected_match
tcompa commented 5 months ago

Apart from defining the expected behaviors for _get_matching_ref_cycle_path_heuristic, we also need to determine what is assumed about path_list. A quick fix is to replace for p in path_list with for p in sorted(path_list), but it's TBD whether other order-related edge cases may appear.

The general solution for the ordering problem is to qualify the matching scenarios in a descending-priority order, so that we can make a pass of the list for each one of them.

jluethi commented 5 months ago

Thanks for spotting the edge-cases of the _get_matching_ref_cycle_path_heuristic. We should not rely on OME-Zarr filenames in general. But I don't think we'd have general & consistent metadata needed in the OME-Zarr to reliably get this info. e.g. we don't know based on OME-Zarr metadata whether an image has been illumination corrected.

As we don't see a clear other choice for the moment, we get the _get_matching_ref_cycle_path_heuristic. If we can simplify it, all the better!

It's main scope should be handling this case; (["0", "0_illum_corr", "0_registered"], "1_illum_corr", "0_illum_corr")

If it can also handle the case of having some versions of FOVs saved as separate OME-Zarrs (e.g. your new proposed test case (["0", "1", "2", "3"], "1_cycle2", "1")), that's valuable. But we don't have a clear definition of how names would be handled in all those cases, so we shouldn't get too complex there.

(["0", "0_illum_corr", "0_registered"], "1_illum_corr", "0_illum_corr"),  # this is identical to the first line, in the current version

The purpose here is to test that it handles multiple different suffices & selects the right one.

tcompa commented 5 months ago

I was a bit confused, but now I understand that the test case

["0", "1", "2", "3"], "1_cycle2" --> "1"

is only relevant for when FOVs are not fused, which is not currently in scope (ref #707). For the record, this can be covered with the following logic

    # First matching rule: a path with the same suffix
    for p in sorted_path_list:
        # Split the list path into base and suffix
        p_base, p_suffix = _split_base_suffix(p)
        # If suffices match, it's the match.
        if p_suffix == suffix:
            return p
    # Second matching rule: a path with the same base
    for p in sorted_path_list:
        # Split the list path into base and suffix
        p_base, p_suffix = _split_base_suffix(p)
        # If suffices match, it's the match.
        if p_base == base:  # and maybe some other condition?
            return p
    ...

but I cannot easily understand if this will introduce false positives. Thus I'd rather not include it in the test cases (or, if we do, we should also find the appropriate additional test cases).

tcompa commented 5 months ago

Without adding any currently-unsupported match scenario, I introduced some other minor changes (which I can revert if needed) ti make the function more robust:

  1. I dropped unreachable branches, without changing tests.
  2. I am enforcing sorting of path_list, so that ["0", "1", "2", "3"], "0_cycle2" --> "0" and ["1", "0", "2", "3"], "0_cycle2" --> "0" both work.
  3. I am excluding path from path_list (if present), so that ["0", "1", "2", "3"], "3" --> "0".
jluethi commented 5 months ago

This sounds good to me and I'm fine to not expand on it more for this PR, let's return when we have actual example data of this needed.

I reviewed the changes and they look good to me. Let's merge! :)