1536 well plates cellvoyager

fractal-analytics-platform / fractal-tasks-core

Main tasks for the Fractal analytics platform

https://fractal-analytics-platform.github.io/fractal-tasks-core/

BSD 3-Clause "New" or "Revised" License

11 stars 5 forks source link

1536 well plates cellvoyager #715

Closed jluethi closed 2 months ago

jluethi commented 2 months ago

WIP PR to add 1536 well support to the cellvoyager converter.

First hurdle:

Wells are not formatted as Letter + 2 digits, but as e.g. A01.a1. 4x4 wells of the 1536 well plate are matched to the 96 well plate location and then get suffixes of .a1 - .d4 to describe their sub-position
The filename parsing was already robust for it, the metadata parsing needed an update (now included with this PR)

Next steps:

[x] Create a small, synthetic test dataset: based on our existing test data (add hand-crafted metadata files & rename the images)
[x] Convert synthetic data: Can our well logic handle well_ids that are 6 characters instead of 3? How do we separate well & column now? e.g. row A, column 01.a1? Or row A01., column a1?
[x] Try converting some real 1536 well plate data
[x] Once we manage a conversion: Figure out whether the napari plugin can show 1536 well plates. Maybe it works out of the box, but probably not in the correct well arrangement layout? Could that be fixed by just tuning the metadata? How does the napari plugin choose its layout?
[x] Do we rely on the B03 style well_id structure & the typical paths in our other tasks? Eventually try an actual workflow with e.g. Cellpose, napari workflows, illum_corr etc. Also check registration tasks
[x] Also implement this for the multiplexing converter

Checklist before merging

[x] I added an appropriate entry to CHANGELOG.md

github-actions[bot] commented 2 months ago

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)
fractal_tasks_core/cellvoyager
metadata.py	$This PR adds 15 to the number of statements in fractal_tasks_core/cellvoyager/metadata.py, taking it from 99 to 114.$	$This PR doesn't change the number of statements missing coverage in fractal_tasks_core/cellvoyager/metadata.py, which is 7.$	$This PR adds 0.93 percentage points to the coverage rate in fractal_tasks_core/cellvoyager/metadata.py, taking it from 92.92% (92/99) to 93.85% (107/114).$	$In this PR, 21 new statements are added to fractal_tasks_core/cellvoyager/metadata.py, 21 of which are covered (100%).$
wells.py	$This PR adds 19 statements to fractal_tasks_core/cellvoyager/wells.py. The file did not seem to exist on the base branch.$	$This PR adds 0 statements missing coverage to fractal_tasks_core/cellvoyager/wells.py. The file did not seem to exist on the base branch.$	$The coverage rate of fractal_tasks_core/cellvoyager/wells.py is 100% (19/19). The file did not seem to exist on the base branch.$	$In this PR, 19 new statements are added to fractal_tasks_core/cellvoyager/wells.py, 19 of which are covered (100%).$
fractal_tasks_core/tasks
cellvoyager_to_ome_zarr_init.py	$This PR adds 2 to the number of statements in fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init.py, taking it from 163 to 165.$	$This PR doesn't change the number of statements missing coverage in fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init.py, which is 25.$	$This PR adds 0.19 percentage points to the coverage rate in fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init.py, taking it from 84.66% (138/163) to 84.84% (140/165).$	$In this PR, 4 new statements are added to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init.py, 4 of which are covered (100%).$
cellvoyager_to_ome_zarr_init_multiplex.py	$This PR adds 2 to the number of statements in fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init_multiplex.py, taking it from 197 to 199.$	$This PR doesn't change the number of statements missing coverage in fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init_multiplex.py, which is 20.$	$This PR adds 0.10 percentage points to the coverage rate in fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init_multiplex.py, taking it from 89.84% (177/197) to 89.94% (179/199).$	$In this PR, 4 new statements are added to fractal_tasks_core/tasks/cellvoyager_to_ome_zarr_init_multiplex.py, 4 of which are covered (100%).$
Project Total

This report was generated by python-coverage-comment-action

jluethi commented 2 months ago

The conversion is working at least for the single well with synthetic data with the new filename structure. There is one issue with Yokogawa's well name convention though: It's not technically compliant with the NGFF spec: For the column name definition in the plate:

The name MUST contain only alphanumeric characters

Now neither our converter nor the napari-ome-zarr plugin appears to have an issue with this. But plate validation using the official schemas fails.

We could just accept that, we could change the well ids we're getting for 1536 well plates (e.g. to A01a1 instead of A01.a1) or we could start a spec discussion on that. Let's see how relevant those 1536 well plates will be first.

jluethi commented 2 months ago

I now went with a logic where we rename the wells for 1536 well plates from A01.a1 to Aa011 (split in row Aa & column 011). This makes it compatible with the NGFF spec on well names & should improve how it's displayed in a viewer like napari. Will test a full plate conversion next to verify :)

jluethi commented 2 months ago

@tcompa This PR is now ready for review. It's not meant to be part of the 1.0.0 release & the deployment on Monday morning, but should be ready for review whenever there is time afterwards. I can run our demo without having this in main by just installing the tasks from a local wheel.

Content-wise: I have found a way now to make the wells display in the correct arrangement and have well names in the OME-Zarr that are consistent with the OME-NGFF spec. Now the test for plate metadata also passes against the spec. The approach was: Convert well names like A01.a1 into row AA, col 011. This normalization of well names wasn't fully trivial, as we still need the well name as it appears in the filenames for filtering. But using 2 new helper functions in cellvoyager.wells (with accompanying unit tests), I think I have a decent solution for this.

Overall, the changes to the converter inits are minimal and no changes to the compute is required. The converter init now just call the 2 helper functions to process the well names and those function have handling for the 2 cases we care about. There also needed to be adaptations to the metadata parsing to convert the integer positions from the XML into the different well names found in the filenames.

I have not implemented in-depth checks of the content of the well names we're getting (e.g. getting well X77 would be impossible in any setup we know, but we'd still process it). I'd keep it that way unless we see a strong reason to become more restrictive.

tcompa commented 2 months ago

Minor comments about get_filename_well_id:

I think 544627f3ea1d5a08d1a085e3bf42577253b51ffc is needed, to avoid a potential IndexError
I also added 928688f88db7bc39b6fd80acdfb7649e1c26a547, but I can revert it if we prefer to also support wells like A1 (that is, with a single-digit column).

jluethi commented 2 months ago

I think https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/544627f3ea1d5a08d1a085e3bf42577253b51ffc is needed, to avoid a potential IndexError I also added https://github.com/fractal-analytics-platform/fractal-tasks-core/commit/928688f88db7bc39b6fd80acdfb7649e1c26a547, but I can revert it if we prefer to also support wells like A1 (that is, with a single-digit column).

They both make sense to me. We don't support wells like A1, so having that explicitly excluded and tested for makes sense :)

tcompa commented 2 months ago

I also slightly refactored generate_row_col_split (see e467ad546ae16573a6c36df65b3c14313a0e7d3d), to be a bit more robust. I guess the reason for only checking the first well was related to performance, but the following example suggests that this is under control (~2000 of the "long" well IDs parsed in ~0.001 seconds):

import time
from fractal_tasks_core.cellvoyager.wells import generate_row_col_split

well_ids = []
for row_base in ["A", "B", "C", "D", "E", "F", "G", "H", "I"]:
    for col_base in ["01", "02", "03", "04", "05", "06"]:
        for row_suff in ["a", "b", "c", "d", "e", "f"]:
            for col_suff in ["1", "2", "3", "4", "5", "6"]:
                well_ids.append(f"{row_base}{col_base}.{row_suff}{col_suff}")

start = time.perf_counter()
output = generate_row_col_split(well_ids)
end = time.perf_counter()

print(f"{len(well_ids)=}")
print(f"{well_ids[:2]=}")
print(f"{output[:2]=}")
print(f"Elapsed time: {end - start:e} s")

len(well_ids)=1944
well_ids[:2]=['A01.a1', 'A01.a2']
output[:2]=[('Aa', '011'), ('Aa', '012')]
Elapsed time: 1.165204e-03 s

jluethi commented 2 months ago

That makes sense, useful to be on the save side here!

jluethi commented 2 months ago

Context on that one: We get integer positions and need to convert them to letters like A, B, C. 65 is capital A. 97 is lower-case a. The indices start with 1, but for the rounding of bases, we need 0-based indexing (-1, then +65 instead of just +64).

It's tested in the test_create_well_ids_96_well function.

jluethi commented 2 months ago

Can you merge & release a 1.0.1 with it then? Then I'd start testing this at FMI later this afternoon.