cgohlke / tifffile

Read and write TIFF files
https://pypi.org/project/tifffile
BSD 3-Clause "New" or "Revised" License
544 stars 154 forks source link

tif.pages missing some pages #199

Closed elyall closed 1 year ago

elyall commented 1 year ago

First, thanks for your amazing work.

I have a 24 page tiff (download link) that I am trying to lazy load pages from. Both imageio's tifffile_v3 plugin and dask_image (which both use tifffile) fail to load pages with indices >15, both returning a IndexError: list index out of range. It appears tif.pages for these files only contains the first 16 pages:

with tifffile.TiffFile(filepath) as tif:
    print(len(tif.pages))
    print(len(tif.series), tif.series[0].shape)
16
<tifffile.TiffFile 'stack_t24_y2048_x2448.tiff'> OME series expected 24 frames, got 16
1 (24, 2048, 2448)

Is this meant to happen? Maybe it's an issue with the file's metadata?

More context should it be helpful:

This issue is only when you specify the index of the page to load as using any of the three packages to read the whole file results in the whole stack being returned.

The file is from a Cytena CellcyteX. The stack contains longitudinal images of an individual field that I concatenate with slices from other fields to produce a full image. I parallelize pre-processing using ray such that an individual task loads a specific slice from multiple files, concatenates them, and processes the complete frame.

Until now I've been processing single page tifs and my pattern has been to wrap imageio.imread in a dask.array.from_delayed call resulting in each file getting mapped to a single task. For multipage tifs I understand this pattern will result in each file being opened and closed many times, but the benefit I see is that all pages will not have to be loaded at the same time on one machine, nor will slices then have to be pickled to be passed between tasks which are often on separate machines. But I'm also open to feedback/suggestions.


Edit: updated tifffile to latest version and re-tested

# Name                 Version                                 Build   Channel
tifffile                 2023.4.12                                 py_0    conda-forge
python                    3.10.10   h3ba56d0_0_cpython    conda-forge
elyall commented 1 year ago

Nevermind, it appears those frames do not actually exist as they are only zero arrays when they do get returned. In which case it does appear the metadata is in fact wrong as it promises frames that don't exist.

cgohlke commented 1 year ago

The issue is that the file only contains 16 pages/IFDs while the metadata references 24. Tifffile should either return an array with zeroed frames 16-24 or revert to a generic series with 16 frames. I'll try to fix this in the next version.

elyall commented 1 year ago

I would have saved myself plenty of time by just examining the whole stack. Conda had installed a 2020 version of tifffile but when I updated to the latest version the warning OME series expected 24 frames, got 16 helped me realize the issue.

I could see an argument for returning empty frames, but in this case it caused me confusion that the two methods for accessing the data didn't match up. It's up to you whether you consider it a bug or not.

cgohlke commented 1 year ago

Should be fixed in v2023.7.4.