JuliaImages / Images.jl

An image library for Julia
http://juliaimages.org/
Other
530 stars 141 forks source link

Mmap for discontiguous data? #587

Closed tlnagy closed 3 years ago

tlnagy commented 7 years ago

I wasn't sure where to post this, but I'm hoping to add support for reading OME-TIFF files to Images.jl. I would also like to add mmap support as these files can get quite large, quite fast. I tried understanding NRRD.jl's usage of mmap to see how I can use that, but the issue I'm running into is how to use mmap for discontiguous array blocks. The general layout of OME-TIFFs is that data is stored in separate 2d XY arrays with labels specifying CZT information. @timholy do you have any suggestions on how to handle this with Images.jl's architecture/mmaping?

ref https://www.micro-manager.org/wiki/Micro-Manager_File_Formats#Image_file_stack_specification

tlnagy commented 7 years ago

Also, data might be split across separate files (due to the 4gb limit of TIFF). I know the file from where a specific XY plane is located. I guess it might be possible to do the loading lazily.

timholy commented 7 years ago

Awesome.

I haven't read that document carefully, but first impression is that you might need to create something a bit like https://github.com/tanmaykm/ChainedVectors.jl. (There may be other examples of similar packages around.) I don't know whether you can mmap the same file multiple times, or whether one giant mmap per file, or something else. It seems that if you can do one mmap per file, then the problem basically becomes one of computing indexing offsets; internally your AbstractArray type might need to maintain a vector of file offsets, one per 2d image, and then use sub2ind computations to figure out where to get the data from. I'd certainly parse the table on opening, so that access is pretty fast when you're actually using the image.

tlnagy commented 7 years ago

That's an interesting idea. How would that work with AxisArrays? Would it be a subtype of that or a completely separate thing?

tlnagy commented 7 years ago

Also, I'm going to rope @quinnj into this discussion since he was the main architect of the mmap redesign in https://github.com/JuliaLang/julia/pull/11280

timholy commented 7 years ago

Completely separate. You could put the "chained array" inside an AxisArray wrapper, if you wanted. But shoot for completely orthogonal design.

I'd even advocate for doing this in two pieces (i.e, two packages):

The TIFF-OME parser would create and return the "chained array," possibly wrapped inside an AxisArray.

The advantage of the two-part split is reusability; it's not hard to imagine that other file formats (or other applications) might want to re-use that part.

shashi commented 7 years ago

For reference, I will link my Discourse answer about using Dagger.jl to do this here: https://discourse.julialang.org/t/mmapping-a-discontiguous-file/2016/2?u=shashi But this should give you the "chained array" bit of this puzzle. :) You might have to create a DistributedImage wrapper type to use it as an image though...

timholy commented 7 years ago

For anyone interested in TIFF-OME, see https://discourse.julialang.org/t/bioformats-in-julia/2440.

tlnagy commented 7 years ago

I'm still interested in writing a pure-julia OME-TIFF reader, I think it would be fine to have both.

johnnychen94 commented 3 years ago

@tlnagy could you help check if this issue is already solved?