fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform
https://fractal-analytics-platform.github.io/fractal-tasks-core/
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

1536 well plates cellvoyager #715

Closed jluethi closed 2 months ago

jluethi commented 2 months ago

WIP PR to add 1536 well support to the cellvoyager converter.

First hurdle:

Next steps:

Checklist before merging

github-actions[bot] commented 2 months ago

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  fractal_tasks_core/cellvoyager
  metadata.py
  wells.py
  fractal_tasks_core/tasks
  cellvoyager_to_ome_zarr_init.py
  cellvoyager_to_ome_zarr_init_multiplex.py
Project Total  

This report was generated by python-coverage-comment-action

jluethi commented 2 months ago

The conversion is working at least for the single well with synthetic data with the new filename structure. There is one issue with Yokogawa's well name convention though: It's not technically compliant with the NGFF spec: For the column name definition in the plate:

The name MUST contain only alphanumeric characters

Now neither our converter nor the napari-ome-zarr plugin appears to have an issue with this. But plate validation using the official schemas fails.

We could just accept that, we could change the well ids we're getting for 1536 well plates (e.g. to A01a1 instead of A01.a1) or we could start a spec discussion on that. Let's see how relevant those 1536 well plates will be first.

jluethi commented 2 months ago

I now went with a logic where we rename the wells for 1536 well plates from A01.a1 to Aa011 (split in row Aa & column 011). This makes it compatible with the NGFF spec on well names & should improve how it's displayed in a viewer like napari. Will test a full plate conversion next to verify :)

jluethi commented 2 months ago

@tcompa This PR is now ready for review. It's not meant to be part of the 1.0.0 release & the deployment on Monday morning, but should be ready for review whenever there is time afterwards. I can run our demo without having this in main by just installing the tasks from a local wheel.


Content-wise: I have found a way now to make the wells display in the correct arrangement and have well names in the OME-Zarr that are consistent with the OME-NGFF spec. Now the test for plate metadata also passes against the spec. The approach was: Convert well names like A01.a1 into row AA, col 011. This normalization of well names wasn't fully trivial, as we still need the well name as it appears in the filenames for filtering. But using 2 new helper functions in cellvoyager.wells (with accompanying unit tests), I think I have a decent solution for this.

Overall, the changes to the converter inits are minimal and no changes to the compute is required. The converter init now just call the 2 helper functions to process the well names and those function have handling for the 2 cases we care about. There also needed to be adaptations to the metadata parsing to convert the integer positions from the XML into the different well names found in the filenames.

I have not implemented in-depth checks of the content of the well names we're getting (e.g. getting well X77 would be impossible in any setup we know, but we'd still process it). I'd keep it that way unless we see a strong reason to become more restrictive.

tcompa commented 2 months ago

Minor comments about get_filename_well_id:

jluethi commented 2 months ago

I think https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/544627f3ea1d5a08d1a085e3bf42577253b51ffc is needed, to avoid a potential IndexError I also added https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/928688f88db7bc39b6fd80acdfb7649e1c26a547, but I can revert it if we prefer to also support wells like A1 (that is, with a single-digit column).

They both make sense to me. We don't support wells like A1, so having that explicitly excluded and tested for makes sense :)

tcompa commented 2 months ago

I also slightly refactored generate_row_col_split (see e467ad546ae16573a6c36df65b3c14313a0e7d3d), to be a bit more robust. I guess the reason for only checking the first well was related to performance, but the following example suggests that this is under control (~2000 of the "long" well IDs parsed in ~0.001 seconds):

import time
from fractal_tasks_core.cellvoyager.wells import generate_row_col_split

well_ids = []
for row_base in ["A", "B", "C", "D", "E", "F", "G", "H", "I"]:
    for col_base in ["01", "02", "03", "04", "05", "06"]:
        for row_suff in ["a", "b", "c", "d", "e", "f"]:
            for col_suff in ["1", "2", "3", "4", "5", "6"]:
                well_ids.append(f"{row_base}{col_base}.{row_suff}{col_suff}")

start = time.perf_counter()
output = generate_row_col_split(well_ids)
end = time.perf_counter()

print(f"{len(well_ids)=}")
print(f"{well_ids[:2]=}")
print(f"{output[:2]=}")
print(f"Elapsed time: {end - start:e} s")
len(well_ids)=1944
well_ids[:2]=['A01.a1', 'A01.a2']
output[:2]=[('Aa', '011'), ('Aa', '012')]
Elapsed time: 1.165204e-03 s
jluethi commented 2 months ago

That makes sense, useful to be on the save side here!

jluethi commented 2 months ago

Context on that one: We get integer positions and need to convert them to letters like A, B, C. 65 is capital A. 97 is lower-case a. The indices start with 1, but for the rounding of bases, we need 0-based indexing (-1, then +65 instead of just +64).

It's tested in the test_create_well_ids_96_well function.

jluethi commented 2 months ago

Can you merge & release a 1.0.1 with it then? Then I'd start testing this at FMI later this afternoon.