dask / dask-image

Distributed image processing
http://image.dask.org/en/latest/
BSD 3-Clause "New" or "Revised" License
210 stars 47 forks source link

dask_image imread performance issue #181

Open jmontoyam opened 3 years ago

jmontoyam commented 3 years ago

Dear dask_image community,

I am a new dask_image user. Maybe, due to my beginner level, I am doing something wrong, but I noticed that reading a collection of images using dask_image is much slower that using single-threaded skimage. I have installed the latest dask_image version available in pypi (dask_image version 0.4.0). In the following example, I am reading 398 images, all of them with the same dimension (64x10240, uint16). Taking into account the dimensions and numbers of images, I would expect dask_image to be slighly slower that single-threaded skimage (due to the tiny dask overhead involved in opening this small number of "tiny images"), but instead the performance of dask_image is much slower (around 24x). Then I proceed to implement the image reading function in pure-dask and the performance is much better than the one obtained with dask_image. In the following I will report the benchmarks results (all the following code-snippets load the same data successfully):

import glob import numpy as np import skimage.io import dask_image.imread from dask import delayed import dask.array as da


Single-threaded-skimage baseline

%%time
all_images = sorted(glob.glob(f"{path_images}/*.tif"))
array_images = np.zeros((len(all_images), 64, 10240), dtype=np.uint16)
for idx, image in enumerate(all_images):
    array_images[idx] = skimage.io.imread(image)

Elapsed time: 510 milliseconds


Using dask_image

%%time
using_dask_image = dask_image.imread.imread(f"{path_images}/*.tif")
array_dask_image = using_dask_image.compute()

Elapsed time: 12.1 seconds


Using pure-dask

%%time
lazy_imread = delayed(skimage.io.imread)  # lazy reader
lazy_arrays = [lazy_imread(image) for image in all_images]
dask_arrays = [
    da.from_delayed(delayed_reader, shape=(64, 10240), dtype=np.uint16)
    for delayed_reader in lazy_arrays
]
using_dask = da.stack(dask_arrays, axis=0).compute()

Elapsed time: 1.09 seconds


Using dask-image with synchronous scheduler

%%time
array_dask_image = using_dask_image.compute(scheduler="synchronous")

Elapsed time: 3 seconds


Using dask-image with processes scheduler

%%time
array_dask_image = using_dask_image.compute(scheduler="processes")

Elapsed time: 6.63 seconds


Using dask-image with threads scheduler

%%time
array_dask_image = using_dask_image.compute(scheduler="threads")

Elapsed time: 12 seconds


Environment:


Thank you very much for your all help ;)

m-albert commented 3 years ago

Hi @jmontoyam,

you're completely right that dask_image shouldn't be significantly slower than the "pure dask" example you provide. Cool that you provide code and timings!

I had a look at your example and found that the problem is that dask_image reads each individual tile using pims.open on the input file pattern, rather than reading in the relevant file in a more targeted manner. So for each read, pims instantiates an ImageSequence, and this is taking very long. These are the relevant lines of the dask_image code:

https://github.com/dask/dask-image/blob/695bc432a34d38f061609277324f15c0181e429f/dask_image/imread/_utils.py#L12-L14

I guess for the application you're working on your "pure dask" code does the trick. As you also commented, it's okay that it's a bit slower than the sequential non-dask version, since its primary purpose consists of providing lazy data chunks.

For dask_image it would be great to fix this though. This issue relates to https://github.com/dask/dask-image/issues/121. I'm writing a PR proposing a solution (will be linked here).

jmontoyam commented 3 years ago

Thank you very much @m-albert!, I am happy to contribute for the first time to this amazing project, even though it is just a tiny bug report :relaxed: ...Please excuse me the following beginner question: why does dask-image use pims for loading the images?...why not use, for instance, a standard delayed skimage.io.imread?

m-albert commented 3 years ago

why does dask-image use pims for loading the images?...why not use, for instance, a standard delayed skimage.io.imread?

@jmontoyam That's a good question. Regarding the delayed part of your question, dask_image currently uses da.map_blocks which can lead to cleaner dask graphs than constructing arrays using delayed (would need to double check this though).

Regarding the skimage.io.imread part of the question, in this thread https://github.com/dask/dask-image/issues/18 @jakirkham argues that pims can return information about image dimensions in a lazy manner and supports the most file formats.

jmontoyam commented 3 years ago

Thank you very much for the explanation @m-albert !...By the way, I have checked the #182 , and everything looks good to me, thank you very much for taking the time to address this issue! ;)

GenevieveBuckley commented 3 years ago

Sorry, clicked the wrong button!

GenevieveBuckley commented 3 years ago

I've just merged #182 from @m-albert now.

Since it seems like that fixes @jmontoyam's problem, I'll close this issue now. Feel free to reopen this issue if I'm wrong about that.

GenevieveBuckley commented 3 years ago

Reopening this issue, based on more recent discussions about the performance of imread.

From @m-albert in https://github.com/dask/dask-image/issues/194#issuecomment-791987698

Critically, dask.array.image.imread() does not show the same problem - https://github.com/dask/dask/blob/master/dask/array/image.py

Would we lose much functionality if that became the canonical way to open images? Perhaps we could adapt it to use pims instead of skimage by default.

This one wasn't on the list of imread functions profiled here. I'd say profiling the performance here might be a good next step. If it performs decently, maybe we move in that direction.

Here's a performance comparison:

import skimage.io
import pims
def pimread(fn):
    with pims.open(fn) as imgs:
        return np.asarray(imgs[0])

%%time
for i in range(10):
    dask_image.imread.imread(os.path.join(folder,'im_*.tif')).compute(scheduler='threads')

# 8.7s Nz, Ny, Nx = 1000, 100, 100

%%time
for i in range(10):
    dask.array.image.imread(os.path.join(folder,'im_*.tif'), imread=pimread).compute(scheduler='threads')

# 6.8s Nz, Ny, Nx = 1000, 100, 100

%%time
for i in range(10):
    dask.array.image.imread(os.path.join(folder,'im_*.tif'), imread=skimage.io.imread).compute(scheduler='threads')

# 5.1s Nz, Ny, Nx = 1000, 100, 100

So dask.array.image.imread leads to faster computations than dask_image.imread.imread and skimage.io.imread is faster than pims.

For array creation, since dask.array.image.imread reads the first tp from disk for determining image shape and dask_image.imread.imread uses pims to determine shape, the latter can be faster in the case of a few huge files. Otherwise dask.array.image.imread is much faster.

%%time
dask_image.imread.imread(os.path.join(folder,'im_*.tif'))
# 470ms Nz, Ny, Nx = 2, 10000, 10000
# 250ms Nz, Ny, Nx = 10000, 10, 10

dask.array.image.imread(os.path.join(folder,'im_*.tif'))
# 1.05s Nz, Ny, Nx = 2, 10000, 10000
# 50ms Nz, Ny, Nx = 10000, 10, 10

From @jakirkham in https://github.com/dask/dask-image/issues/194#issuecomment-792002652

One could create a hybrid solution. Namely use PIMS just to find shape and dtype info and then use skimage.io.imread or similar to read in each chunk on workers

m-albert commented 3 years ago

From @jakirkham in #194 (comment) One could create a hybrid solution. Namely use PIMS just to find shape and dtype info and then use skimage.io.imread or similar to read in each chunk on workers

That's a good idea. Given the good performance of skimage.io.imread, probably it makes sense to use it as a default for reading images. Do you think it makes sense to add scikit-image as a dependency for dask-image? Alternatively, we could try importing it and use pims as a fallback:

try:
    from skimage.io import imread as imread_func
except (AttributeError, ImportError):
    def imread_func(fn):
        return np.array(pims.open(fn))
    pass
jakirkham commented 3 years ago

Yeah we are already thinking about having some scikit-image style APIs. So adding scikit-image as a dependency makes sense

jakirkham commented 3 years ago

cc @jni (in case you have thoughts here 🙂)

GenevieveBuckley commented 3 years ago

Doesn't scikit-image use imageio to read in images? I think that's the default, although they do sometimes use tifffile, or other image reader plugins. scikit-image is a very large dependency (mostly because it includes scipy, which I guess is something we already do)

One thing I wish we had was a lightweight, minimal dependency way to install just an image reader to get stuff into a dask array. I'm not sure if there's a really clean way to handle this.

jakirkham commented 3 years ago

Does it? I got the impression it was more complicated than that. Feel free to contradict me though if that's not accurate (guessing you or Juan would know better)

Yeah we've discussed with imageio before if they could provide the shape and dtype without loading the full image ( https://github.com/imageio/imageio/issues/362 ). This would actually let it take the place of PIMS for Dask Array construction and handle the data loading. Unfortunately that hasn't been solved yet.

jakirkham commented 3 years ago

From @jakirkham in #194 (comment) One could create a hybrid solution. Namely use PIMS just to find shape and dtype info and then use skimage.io.imread or similar to read in each chunk on workers

That's a good idea. Given the good performance of skimage.io.imread, probably it makes sense to use it as a default for reading images. Do you think it makes sense to add scikit-image as a dependency for dask-image? Alternatively, we could try importing it and use pims as a fallback:

try:
    from skimage.io import imread as imread_func
except (AttributeError, ImportError):
    def imread_func(fn):
        return np.array(pims.open(fn))
    pass

One thing that is worth noting here is PIMS does seem to be fairly clever about loading a range of pages from a TIFF, which can be handy if the TIFFs are massive. Loading a smaller range of pages can result in more manageable chunks for Dask. While I have seen this use case in the wild, not everyone does this. I'm guessing it is more a function of the acquisition software people are using. Not saying we should tailor things to that use case, but had forgotten this detail until we started discussing this recently. So wanted to share that context

GenevieveBuckley commented 3 years ago

Does it? I got the impression it was more complicated than that. Feel free to contradict me though if that's not accurate (guessing you or Juan would know better)

I think you're probably right.

GenevieveBuckley commented 3 years ago

One thing that is worth noting here is PIMS does seem to be fairly clever about loading a range of pages from a TIFF, which can be handy if the TIFFs are massive. Loading a smaller range of pages can result in more manageable chunks for Dask. While I have seen this use case in the wild, not everyone does this. I'm guessing it is more a function of the acquisition software people are using. Not saying we should tailor things to that use case, but had forgotten this detail until we started discussing this recently. So wanted to share that context

That's a really important point.

This seems like a good point to mention Nick's earlier question about loading movie files efficiently https://github.com/dask/dask-image/issues/134, which is also an important use case to keep in mind. For what it's worth, I'm collaborating with a group who are working with .mp4 video data so this is relevant for me now too.

jakirkham commented 3 years ago

Thanks for reminding me about Nick's use case.

If we do care about optimizing both use cases, we could try to detect when chunks are smaller than the original files and handle these with PIMS. Otherwise just load the whole thing into memory with scikit-image (or something else we decide to use here). This could get a little tricky, but think this is doable.

Not sure whether this will be easier to do during graph construction or at image load time. So probably whichever is easier would be the way to go initially and we could change things if there's a performance benefit later

jni commented 3 years ago

I do have opinions here!

(1) skimage.io will eventually become a thin wrapper around imageio. (2) imageio will start to do smarter things around lazy loading, see https://github.com/imageio/imageio/issues/569, https://github.com/imageio/imageio/pull/574, and the links therein. (3) overall there should be a community-wide effort around this, see https://blog.danallan.com/posts/2020-03-07-reader-protocol/

None of this is particularly helpful re dask-image's present choice, except to say that maybe some/all of the effort in this discussion should go towards those issues rather than towards adding Yet Another way of wrapping wrappers around IO libraries.

Re tifffile, for reading tiffs it always boils down to that (whether you're using imageio or skimage.io), so if you want to do lazy loading of big tiffs I suggest implementing it on top of tifffile directly — it certainly has that capability, no need for PIMS here.

FirefoxMetzger commented 3 years ago

One thing I wish we had was a lightweight, minimal dependency way to install just an image reader to get stuff into a dask array. I'm not sure if there's a really clean way to handle this.

@GenevieveBuckley pip install imageio :P Currently, it depends on numpy and pillow, but if pillow is too much (common formats like .jpg, .png, .gif), there is the option to install it using pip install --no-dependencies imageio and then only install the plugins you actually want/need.

One can also envision a dask plugin/wrapper to allow loading the image straight into a dask_image and avoid a copy. I'm thinking about doing this for pytorch and tensorflow at some point down the line (directly load into pinned memory) because my current lab does a lot of deep learning with images. __array_interface__ may already do this trivially for dask, because imageio returns a numpy array.

For what it's worth, I'm collaborating with a group who are working with .mp4 video data so this is relevant for me now too.

This is also something that can be done with imageio 🤩. We maintain an ffmpeg wrapper (pip install imageio[ffmpeg]), which can then read videos frame-by-frame or as an image stack via the familiar API. It will get even better once we have https://github.com/imageio/imageio/pull/574 merged. In the future, I hope to replace this with a wrapper around av, because there is little point in duplicating the effort of cleanly wrapping and shipping ffmpeg.

Thoughts and feature requests are (of course) appreciated.

Does it? I got the impression it was more complicated than that.

@jakirkham It will 👼 as soon as I get https://github.com/imageio/imageio/pull/574 merged and find the time to write the wrapper for skimage.

Yeah we've discussed with imageio before if they could provide the shape and dtype without loading the full image [...]

Metadata for images is a never-ending story xD I think the reason there is no clear standard for it yet in imageio is that every format has its own set of metadata, so it is non-trivial to find common ground that we can guarantee to provide for all formats. For me specifically, the user-side is a bit of a black box, because I've not really seen use-cases yet where people actively consume metadata; then again I've only recently joined this particular corner of the internet, so there is a lot I may not know (yet).

GenevieveBuckley commented 3 years ago

@jakirkham are there any disadvantages you see in @FirefoxMetzger's comment?

FirefoxMetzger commented 2 years ago

A quick update/question (to bring this thread back from the dead): I have started looking into adding out and like arguments to some of ImageIO's plugins (https://github.com/imageio/imageio/issues/670) to improve I/O speed. In particular, I'd like to bring down the number of copies involved in the overall process. Since I was also thinking of making this work with dask arrays, too, I have two questions:

  1. Regarding like: Can dask efficiently consume the Buffer Protocol? If no what would be the most efficient/desirable way to construct a dask array from a memoryview? Wrap the buffer into a numpy array and call from_array on it?

  2. Perhaps more importantly regarding out: Is it possible to get a writable memoryview into dask's data? This is me being naive and unknowledgable about dask's internal layout, but what I am hoping for is to have dask create a (C-contiguous) buffer into which I can directly place decoded pixel values. This way, we could avoid allocating intermediary buffers and also reuse existing structures, which will be really useful for big datasets, e.g., when iterating over a video in batches.

Also, if I should not necromance this thread and instead start a new one to discuss this, please let me know and I'll do that :D

jakirkham commented 2 years ago

Sorry, but I don't really follow. This doesn't seem relevant to Dask AFAICT. It isn't NumPy. Dask Array's are lazy and do not themselves support the Python Buffer Protocol. Individual Dask chunks would be created by asking ImageIO to open a file. Generally Dask Arrays expect NumPy or NumPy-like chunks. Also Dask defers to other libraries to create memory they need to consume.


From the Dask perspective, the really important thing here is Dask needs to first construct a task graph, which contains accurate metadata. Namely the shape and dtype. Ideally we would want to load that metadata as fast as possible. This is what users here are raising. They want that metadata loading step to be faster. This is where ImageIO could potentially help.

Ideally we want to avoid loading images (slow) to get that metadata. Reading headers of files would be ideal. At least from Dask's perspective, it doesn't really need to know more than shape & dtype. This got mentioned above in this comment ( https://github.com/dask/dask-image/issues/181#issuecomment-797177004 ). Have copied the relevant bit below:

Yeah we've discussed with imageio before if they could provide the shape and dtype without loading the full image ( imageio/imageio#362 ). This would actually let it take the place of PIMS for Dask Array construction and handle the data loading. Unfortunately that hasn't been solved yet.

FirefoxMetzger commented 2 years ago

Dask Array's are lazy and do not themselves support the Python Buffer Protocol. Individual Dask chunks would be created by asking ImageIO to open a file. Generally Dask Arrays expect NumPy or NumPy-like chunks.

Oh, in that case, sorry for the noise. I was under the impression that images are loaded centrally and that chunks are then distributed as needed. Since this is not the case, my mental model is off and this is indeed not too relevant.

At least from Dask's perspective, it doesn't really need to know more than shape & dtype.

I think we can do that. It is possible for the major plugins which cover all commonly used image formats, so that should get us 90% there. I'll look into this once my current round of PRs is merged 👍

jakirkham commented 2 years ago

No worries. Dask would run the loading step on workers that would use the data directly. In general Dask tries to minimize communication both in its usage model and when distributing tasks to workers as communication can get pretty expensive. While I can imagine use cases that might benefit, unfortunately I don't think Dask is one of them.

That would be incredibly useful! Thank you 😄

GenevieveBuckley commented 2 years ago

More discussion at https://github.com/dask/dask/pull/8385