Reading header tags is slow, especially for files with many frames

DylanMuir / TIFFStack

Load TIFF files into matlab fast, with lazy loading

http://dylan-muir.com/articles/tiffstack/

Other

36 stars 20 forks source link

Reading header tags is slow, especially for files with many frames #28

Open DylanMuir opened 6 years ago

DylanMuir commented 6 years ago

The tiffread31_header function is slow to iterate over the IFDs in large files. imfinfo is fast, but reads all tags.

DylanMuir commented 6 years ago

imfinfo is fast because it uses the undocumented Matlab mex file matlab.io.internal.imagesci.tifftagsread. The calling syntax appears to be:

function vsInfoStructure = ...tifftagsread(strFilename, nBytesOffset, nIFDsToSkip, nNumIFDsToRead);

tifftagsread is 10x faster than the matlab version of tiffread31_header, but reads all tags. This is undesirable for ScanImage TIFF files, since the headers contain large duplicated Software and Artist tags (and others?).

DylanMuir commented 6 years ago

Suggestion: to write an accelerated mex version that reads only necessary tags from the TIF file. Hassles:

We can't know how many IFDs are in the file in advance. Therefore need to handle re-allocation or chaining of data. Return cell arrays, allocate cells with chunks of data as necessary?
Returning structures is kind of a pain. Return a cell array for each tag, convert to a structure in Matlab?

ehennestad commented 2 years ago

Hi, I have some ideas/questions about this.

1) Sometimes we might know how many images/IFDs are in a tiff file? What about creating an option for passing such information to the TIFFStack on creation? That could partially solve the problem of reallocation you describe above.

2) Sometimes, all the image in a tiff stack are uniform. Is it then necessary to read through all the headers? Say I know the number of images in the TIFFStack I want to open, and I know that all the images are the same format, is there a reason not to jus read the header of the first directory and use that information for all the remaining directories?