dask / dask-image

Distributed image processing
http://image.dask.org/en/latest/
BSD 3-Clause "New" or "Revised" License
210 stars 47 forks source link

Multi-series and multi-channel nd2 files are loaded incompletely without error/warning #364

Open StefanHMueller opened 6 months ago

StefanHMueller commented 6 months ago

Issue

I use the imread function to load nd2 files recorded on Nikon microscopes that contain multiple series (i.e. multiple fields of view are imaged simultaneously) and multiple color channels as well as multiple frames. The imread function just returns all frames at the first series and the first color channel, i.e. a 5D image with x=512, y=512, c=2, t=5, v=3 is reduced to a dask array with shape (5,512,512) without error or warning. Test data in .nd2 format to reproduce can be found here: https://github.com/Single-molecule-Biophysics-UOW/TIRF_tools/blob/main/test_data/multi_series_multiC_002.nd2

from dask_image import imread
import pims
path = r'Z:\In_vitro_replication\Stefan\test\test/multi_series_multiC_001.nd2'
im_dask = imread.imread(path)
im_pims = pims.open(path)

print('dask:',im_dask.shape)
print('pims:',im_pims)

This returns:

dask: (5, 512, 512)
pims: <FramesSequenceND>
Axes: 5
Axis 'x' size: 512
Axis 'y' size: 512
Axis 'c' size: 2
Axis 't' size: 5
Axis 'v' size: 3
Pixel Datatype: <class 'numpy.float64'>

This behavior depends on which reader pims uses internally (options are nd2reader or bioformats), but either way dask_image does not handle the output correctly.

Cause

The reason for this behavior is the following: In the imread function the shape of the image is determined using len(im) which only returns the number of frames regardless of other dimensions:

    with pims.open(sfname) as imgs:
        shape = (len(imgs),) + imgs.frame_shape
        dtype = np.dtype(imgs.pixel_type)

Workaround

This behavior can be fixed by changing the iteration axis and bundling of axes for the pims reader as follows. I also added some quick and dirty string shuffling to make sure xy are the two last dimensions:

with pims.open(path) as imgs:    
    shape = (len(imgs),) + imgs.frame_shape
    print(shape)
    #change iteration axis
    imgs.iter_axes = 'v'
    ax_bundle = ''.join(imgs.axes).replace('v','')
    ax_bundle_sorted = ax_bundle.replace('x','').replace('y','') + 'xy'
    imgs.bundle_axes = ax_bundle_sorted
    #get the hsape gain
    shape = (len(imgs),) + imgs.frame_shape
    print(shape)

This returns:

(5, 512, 512)
(3, 2, 5, 512, 512)

As expected the shape as imread determines it is now correct. To actually use this work around I made my own version of the imread function as well as the _read_frame function and inserted those lines just after the pims.open(... statements. This works well, but probably is very dependent on the exact file type used and the reader used by pims. Perhaps it would be feasible to implement a warning if imread returns an array with less dimensions than the pims imageSequenceND has axes though?

m-albert commented 6 months ago

@StefanHMueller thanks for reporting this issue and for detailing your workaround 🙏

There are several problems with the current dask_image.imread implementation. Here's an issue tracking some of them and also the roadmaps towards improving the situation.

I'm glad you've found a workaround that works for you. Also consider using the readers mentioned here.

[]()

StefanHMueller commented 6 months ago

Thanks @m-albert . I hadn't see the AICSImageIO readers before. It produces dask arrays with the right shape and does so quickly. This at least solves my problem, even though it would be nice not to depend on yet another package one day. Reading nd2 files of any dimension to dask works for me simply like this:

from aicsimageio import AICSImage
path = 'path_to_file/.nd2'
img = AICSImage(path)
dask_array = img.dask_data