`get_zarr` returns incorrect data from 3D + 2D MM datasets

talonchandler commented 1 year ago

Here's a minimal demonstration of the issue

>>> from iohub.reader import imread
>>> reader = imread('/hpc/projects/compmicro/rawdata/other/NIC QLIPP/2022_12_01 PtK cells/FOV1_1')

>>> reader.get_zarr(0)[0,-1,:,500,500] # get the Z profile over the last channel
array([28992, 48564, 22054,     0,     0,     0,     0,     0, 11040,
       19283, 20553,  8202,  8224, 21536, 30821, 10356, 11824, 11317,
       12320,  8236,   631, 10535,  8202,  8224, 15904, 15934, 28704,
       29804, 29486, 28520, 10359,  8233,  8992, 25632, 25455, 25972,
       29811,  8250, 21291, 18763,   224], dtype=uint16)

>>> reader.get_zarr(0)[0,-1,:,500,500] # repeated call gives a different result!
array([21184, 48542, 22054,     0,     0,     0,     0,     0,  2570,
       25939,  8293, 27713, 28531, 11530, 11565, 11565, 11565,  2605,
       27750, 24943,   631, 28528, 25975,  8306,  8250, 28528, 25975,
        8306, 30054, 25454, 26996, 28271, 29728, 24936,  8308, 29296,
       28015, 29807, 29541, 26912,   224], dtype=uint16)

>>> reader.get_array(0)[0,-1,:,500,500] # get_array gives the expected result
array([  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0, 631,   0,   0,   0,   0,   0,
         0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         0,   0], dtype=uint16)

Notes

This dataset is a 3D + 2D dataset from MM...only the 2D channel fails on repeated get_zarr calls
To see the correct result, use get_array, open the dataset in MM, or open with napari's builtin reader
I tested this issue on the converter branch with the new imread function, and I also confirmed that it was inherited from the WaveorderReader.

Symptoms:

recorder view uses get_zarr (currently the WaveorderReader version...with the same behaviour) and behaves erratically when you scroll through Z in napari
Tagging @ieivanov because this issue affects the spindle data from the NIC. This issue is not blocking (scripts can use get_array for now), but it is an strong inconvenience since it blocks many of the napari viewing conveniences that we're used to.

ieivanov commented 1 year ago

A bit more context here - this dataset has several 3D channels (ZYX) and one 2D channel, i.e. one image of a single z-plane placed at z=n_slices//2 index in the metadata. You can get that if you uncheck "acquire z-stack" (or something similar) for a given channel in the MDA.

ziw-liu commented 1 year ago

It's likely because the array is initialized as 'empty' instead of 'zeros': https://github.com/czbiohub/iohub/blob/e96e3ccdca4d2396a358c847a90e314c7dfc7665/iohub/multipagetiff.py#L269-L273 And Zarr docs notes:

The contents of an empty Zarr array are not defined. On attempting to retrieve data from an empty Zarr array, any values may be returned, and these are not guaranteed to be stable from one access to the next.

A note to our future selves that being deterministic is the important. Also filling Zarr arrays with zeros (fill_value: 0) doesn't write any chunks with zarr>=2.11, so the need for zarr.empty() should be rare now.

czbiohub-sf / iohub

`get_zarr` returns incorrect data from 3D + 2D MM datasets #82