czbiohub-sf / iohub

Pythonic and parallelizable I/O for N-dimensional imaging data with OME metadata
https://czbiohub-sf.github.io/iohub/
BSD 3-Clause "New" or "Revised" License
29 stars 7 forks source link

`get_zarr` returns incorrect data from 3D + 2D MM datasets #82

Closed talonchandler closed 1 year ago

talonchandler commented 1 year ago

Here's a minimal demonstration of the issue

>>> from iohub.reader import imread
>>> reader = imread('/hpc/projects/compmicro/rawdata/other/NIC QLIPP/2022_12_01 PtK cells/FOV1_1')

>>> reader.get_zarr(0)[0,-1,:,500,500] # get the Z profile over the last channel
array([28992, 48564, 22054,     0,     0,     0,     0,     0, 11040,
       19283, 20553,  8202,  8224, 21536, 30821, 10356, 11824, 11317,
       12320,  8236,   631, 10535,  8202,  8224, 15904, 15934, 28704,
       29804, 29486, 28520, 10359,  8233,  8992, 25632, 25455, 25972,
       29811,  8250, 21291, 18763,   224], dtype=uint16)

>>> reader.get_zarr(0)[0,-1,:,500,500] # repeated call gives a different result!
array([21184, 48542, 22054,     0,     0,     0,     0,     0,  2570,
       25939,  8293, 27713, 28531, 11530, 11565, 11565, 11565,  2605,
       27750, 24943,   631, 28528, 25975,  8306,  8250, 28528, 25975,
        8306, 30054, 25454, 26996, 28271, 29728, 24936,  8308, 29296,
       28015, 29807, 29541, 26912,   224], dtype=uint16)

>>> reader.get_array(0)[0,-1,:,500,500] # get_array gives the expected result
array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0, 631,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0], dtype=uint16)

Notes

Symptoms:

ieivanov commented 1 year ago

A bit more context here - this dataset has several 3D channels (ZYX) and one 2D channel, i.e. one image of a single z-plane placed at z=n_slices//2 index in the metadata. You can get that if you uncheck "acquire z-stack" (or something similar) for a given channel in the MDA.

ziw-liu commented 1 year ago

It's likely because the array is initialized as 'empty' instead of 'zeros': https://github.com/czbiohub/iohub/blob/e96e3ccdca4d2396a358c847a90e314c7dfc7665/iohub/multipagetiff.py#L269-L273 And Zarr docs notes:

The contents of an empty Zarr array are not defined. On attempting to retrieve data from an empty Zarr array, any values may be returned, and these are not guaranteed to be stable from one access to the next.

A note to our future selves that being deterministic is the important. Also filling Zarr arrays with zeros (fill_value: 0) doesn't write any chunks with zarr>=2.11, so the need for zarr.empty() should be rare now.