dask / dask-image

Distributed image processing
http://image.dask.org/en/latest/
BSD 3-Clause "New" or "Revised" License
210 stars 47 forks source link

Add imsave function #110

Open mrocklin opened 5 years ago

mrocklin commented 5 years ago

I found myself reaching for an imsave function to compliment imread. Presumably this would have similar semantics, and would effectively map over the skimage.io.imsave function, or something else in pims.

I don't have a concrete need though, this just came up when writing up an example.

mrocklin commented 5 years ago

@jakirkham mentioned the following in a separate conversation:

Does store or to_zarr not work? This is my sense of what people do today.

Could be, I don't actually do this work. So people today don't write many images out? I would expect that for analysis many people would use something like Zarr or HDF as intermediate formats, but that for long time archives, sharing, or publishing people would still want to save to PNG or TIFF or something.

jakirkham commented 5 years ago

So people today don't write many images out? I would expect that for analysis many people would use something like Zarr or HDF as intermediate formats, but that for long time archives, sharing, or publishing people would still want to save to PNG or TIFF or something

Microscope recording software definitely writes out many images today. This is used as input for analysis and is also archived for long term storage. This may also be the thing that is shared with others.

What users produce is dependent on their analysis. One use case is to produce Regions of Interest, which could live happily in JSON. Another use case is to do some cleanup on this data and ingest it into some sort of centralized database. Other use cases produce Zarr/N5 files or HDF5 files, which may be shared and used for further analysis or could go into long term storage.

Publication/sharing may mean hosting the data with a web server, which means having a robust database to back it is pretty important. It could also mean generating some figures in a paper, which are likely generated outside of the analysis pipeline altogether.

RutgerK commented 5 years ago

Just FYI. The Satpy project uses Dask to process satellite imagery in a chunk-based fashion. It allows saving results to disk as a GeoTIFF, PNG etc.

https://github.com/pytroll/satpy

TAdeJong commented 5 years ago

I needed this in my science work and came up with this, based on gufuncs:

import dask.array as da
from skimage.io import imsave

def da_imsave(fnames, arr, compute=False):
    """Write arr to a stack of images assuming
    the last two dimensions of arr as image dimensions.

    Parameters
    ----------
    fnames: string
        A formatting string like 'myfile{:02d}.png'
        Should support arr.ndims-2 indices to be formatted
    arr: dask.array
        Array of at least 2 dimensions to be written to disk as images
    compute: Boolean (optional)
        whether to write to disk immediately or return a dask.array of the to be written indices

    """
    indices = [da.arange(n, chunks=c) for n,c in zip(arr.shape[:-2], arr.chunksize[:-2])]
    index_array = da.stack(da.meshgrid(*indices,indexing='ij'), axis=-1).rechunk({-1:-1})

    @da.as_gufunc(signature=f"(i,j),({arr.ndim-2})->({arr.ndim-2})", output_dtypes=int, vectorize=True)
    def saveimg(image, index):
        imsave(fnames.format(*index), image.squeeze())
        return index

    res = saveimg(arr,index_array)
    if compute == True:
        res.compute()
    else:
        return res

Would it be useful to build into a pull request, either here on in dask/dask? What would still be needed for that?

GenevieveBuckley commented 5 years ago

Hi @TAdeJong!

What would be needed is: (a) For us to decide how saving should work in dask-image. This is could be a bit of a bottleneck. (b) To make a saving function that is a little more general than your example above. You have a few assumptions that probably wouldn't work for everybody (eg: that you have a 2D image, that the last two dimensions of the array describe spatial dimensions, etc). Some of this will depend on the result of the discussion in (a).

Re: comments by @mrocklin and @jakirkham : As I see it:

  1. Yes, absolutely people want to write out images, and to a format they are familiar with (like tiff, or similar). I would like us to get something in place for this, I think this group cares about archiving processed data, so prioritizing open and accessible file formats is important, and speed & how compressed the data is on disk are of secondary importance. 2. Secondary, people probably also want a compressed way to write out data. Maybe something like zarr makes sense here? 3. Third, there might be people who want to write out to a hierarchical multi-resolution format. We probably don't have the bandwidth for this right now.

I think we should prioritise group 1 with a view to extending to groups 2 (and perhaps 3?) down the track.*

*Edit: upon reflection only a small part of this is a plausibly good idea.

TAdeJong commented 5 years ago

Hi @GenevieveBuckley , (a): I was comparing to dask/dask/array/image.py and think it would be at least nice to get similar capabilities writing out as reading in. In that sense, I think it might be a good idea to put both reading and writing capabilities in the same place, but beyond that I have no opinion whether this should be in core dask or in dask-image. (b) I agree that color/multichannel support is desirable and is not hard to add in this code (via an explicit switch + guessing based on the last dimension, i.e. if it has length 3 or 4. For images, I think memory layout wise it only makes sense if the last 2 (or 3 in case of RGB(A)) dimensions are the individual images, so I would assume an explicit transpose/swapaxis by the user would be the way to go there, of course in combination with clear documentation/example.

Regarding the compressed way to write out data, I wonder if there are any features that would be needed in addition to what dask.array.to_zarr() offers?

GenevieveBuckley commented 5 years ago

I do think there's a place for functionality that saves image files (even if it's a basic functionality) in dask-image. So no replicating functionality that already exists in dask itself (like dask.array.to_zarr()), but we might have something specifically for saving to image specific formats.

When I say "more than just 2D arrays", I don't only mean that sometimes we have colour channels. As a rough guide, I have to think about:

So we can expect typical data might have anywhere between 2 and 5 dimensions, and there's often a lot of variety in which order we see those dimensions.

sumanthratna commented 4 years ago

Has any progress been made on this? I'm working on image processing and one of the smallest image sizes in my current dataset is 81000 by 31000 pixels. There isn't a quick way to save an array of this size as a PNG.

GenevieveBuckley commented 4 years ago

Hi @sumanthratna No, there hasn't been any activity on this in the last couple of months.

You could try either adapting TAdeJong's script above for your purposes, or look at the Saalfeld lab's N5 library for reading/writing large arrays. It can write to file in parallel, which might help with speed. Caveats: I haven't used this library myself but just chatted to Stephan about it a few months ago; it's still in the early stages so it might not have the features or documentation you need for your project; and image chunks cannot be larger than 2GB which may or may not work for you. Good luck!

jakirkham commented 3 years ago

Just a note that using to_zarr as in these examples, should also write this out to disk in parallel.

GenevieveBuckley commented 2 years ago

Related discussion: https://github.com/dask/dask/issues/3487

lrlunin commented 2 years ago

Hello there,

first of all I want to thank you all for the library which made my expirement possible. I am quiet familliar with python and would be happy to implement this method. I suppose that it wasn't implemented before because there is some kind of difficulties. May I ask what are the major issues/difficulties?

jakirkham commented 2 years ago

I think Genevieve's comment above ( https://github.com/dask/dask-image/issues/110#issuecomment-519009634 ) pretty accurately captures the tricky points that would need to be addressed.

khyll commented 1 year ago

I'm really appreciating dask-image so far and an imsave/imwrite to e.g. tiff would make it even better.