dask / dask-image

Distributed image processing
http://image.dask.org/en/latest/
BSD 3-Clause "New" or "Revised" License
210 stars 47 forks source link

Calling Scikit image or ndimage to dask array just creates a never ending function and computes without actually calling compute. #153

Closed Sh4zKh4n closed 4 years ago

Sh4zKh4n commented 4 years ago

Hi,

When working with a large data dask shows some promise but I cant repeat similar actions that have been shown elsewhere, I havent been able to get dask to work with bigger than ram data with other libraries. I am unable to get any libraries like Scikit image to not call or compute immediately. It is if calling them persists to memory. I just get stuck in a loop that is either memory issue or I get stuck with a loop with my hard drive reading and writing but not to a folder. These should all be futures or delayed functions until I call them? shouldnt they? This is a problem even if I use map_blocks or Map overlay, it just tries to stick all the array into memory instead of flushing to a zarr file. Everytime I call a library or a function unless it's ufunc manipulations, this seems to computer and try to pull into memory. Which with 16GB of ram isn't going to happen.

I can run pixel manipulations, 40GB or 300GB files, quite easily and having a super fast contrast stretch, dask histogram works when I chunk around 50MB, I manually built with pixel wise function with Ufunc and its great. It just means all the examples that Ive read around just never work. I have even cleaned my computer once and it doesnt work. The below code has now been running for 20 mins without any changes just my hard drive writing something and nothing else. Is there some global state that can turn of libraries from persisting to memory unless compute is called or saved to file? Ill add my enviroment details soon.

Example is below:

import numpy as np import scipy import scipy.ndimage import matplotlib import matplotlib.pyplot as plt %matplotlib inline import dask import dask.array as da from dask.diagnostics import ProgressBar import os import zarr pbar = ProgressBar() pbar.register()

plt.rcParams["figure.figsize"] = (10,10) import gc gc.collect()

da_Colz = da.from_zarr('E:\Col_Zarr_CS\mask\med.zarr') da_Colz

image=da_Colz

from skimage.transform import rescale

image_rescaled = rescale(image, 0.25, preserve_range=True ,anti_aliasing=True, )


image

Environment:

Name Version Build Channel

_pytorch_select 0.1 cpu_0 alabaster 0.7.12 py37_0 asciitree 0.3.3 py_2 attrs 19.3.0 py_0 babel 2.8.0 py_0 backcall 0.2.0 py_0 blas 1.0 mkl bleach 3.1.5 py_0 blosc 1.19.0 h7bd577a_0 bokeh 2.0.2 py37_0 brotli 1.0.7 h33f27b4_0 brotlipy 0.7.0 py37he774522_1000 bzip2 1.0.8 he774522_0 ca-certificates 2020.6.24 0 cachey 0.2.1 pyh9f0ad1d_0 conda-forge certifi 2020.6.20 py37_0 cffi 1.14.0 py37h7a1dbc1_0 cftime 1.2.1 py37h2a96729_0 chardet 3.0.4 py37_1003 charls 2.1.0 h33f27b4_2 click 7.1.2 py_0 cloudpickle 1.5.0 py_0 colorama 0.4.3 py_0 colorcet 2.0.2 py_0 conda 4.8.4 py37_0 conda-package-handling 1.6.1 py37h62dcd97_0 cryptography 2.9.2 py37h7a1dbc1_0 cudatoolkit 10.0.130 0 cudnn 7.6.5 cuda10.0_0 cupy 6.0.0 py37h230ac6f_0 curl 7.71.1 h2a8f88b_1 cycler 0.10.0 py37_0 cython 0.29.21 py37h1834ac0_0 conda-forge cytoolz 0.10.1 py37he774522_0 dask 2.17.2 py_0 dask-core 2.17.2 py_0 dask-image 0.3.0 pyh9f0ad1d_0 conda-forge datashader 0.11.1 py_0 datashape 0.5.4 py37_1 decorator 4.4.2 py_0 defusedxml 0.6.0 py_0 distributed 2.23.0 py37_0 docutils 0.16 py37_1 entrypoints 0.3 py37_0 fasteners 0.15 py_0 fastparquet 0.3.2 py37h2a96729_0 fastrlock 0.4 py37h6538335_0 freetype 2.10.2 hd328e21_0 freetype-py 2.2.0 pyh9f0ad1d_0 conda-forge fsspec 0.7.4 py_0 giflib 5.2.1 h2fa13f4_2 conda-forge hdf4 4.2.13 h712560f_2 hdf5 1.10.4 h7ebc959_0 heapdict 1.0.1 py_0 holoviews 1.13.3 py_0 holoviz 0.11.6 py_0 pyviz hvplot 0.6.0 py_1 icc_rt 2019.0.0 h0cc432a_1 icu 64.2 he025d50_1 conda-forge idna 2.10 py_0 imagecodecs 2020.5.30 py37h92c78e3_2 conda-forge imageio 2.9.0 py_0 imagesize 1.2.0 py_0 importlib-metadata 1.7.0 py37_0 importlib_metadata 1.7.0 0 intel-openmp 2019.4 245 ipydatawidgets 4.0.1 pyh9f0ad1d_1 conda-forge ipykernel 5.3.3 py37h5ca1d4c_0 ipympl 0.3.3 py_0 ipython 7.17.0 py37h39e3cac_0 ipython_genutils 0.2.0 py37_0 ipywidgets 7.5.1 py_0 itk 5.1.0 py37hc8dfbb8_2 conda-forge itk-core 5.1.0.post3 pypi_0 pypi itk-filtering 5.1.0.post3 pypi_0 pypi itk-numerics 5.1.0.post3 pypi_0 pypi itk-registration 5.1.0.post3 pypi_0 pypi itkwidgets 0.31.4 py37hc8dfbb8_0 conda-forge jedi 0.17.2 py37_0 jinja2 2.11.2 py_0 joblib 0.16.0 py_0 jpeg 9d he774522_0 conda-forge json5 0.9.5 py_0 jsonschema 3.2.0 py37_1 jupyter_client 6.1.6 py_0 jupyter_core 4.6.3 py37_0 jupyterlab 2.2.0 py_0 conda-forge jupyterlab_server 1.2.0 py_0 jxrlib 1.1 he774522_2 kiwisolver 1.2.0 py37h74a9793_0 krb5 1.18.2 hc04afaa_0 lcms2 2.11 hc51a39a_0 lerc 2.2 ha925a31_0 conda-forge libaec 1.0.4 h33f27b4_1 libblas 3.8.0 14_mkl conda-forge libcblas 3.8.0 14_mkl conda-forge libclang 9.0.1 default_hf44288c_0 libcurl 7.71.1 h2a8f88b_1 libiconv 1.15 h1df5818_7 libmklml 2019.0.5 0 libnetcdf 4.7.3 h1302dcc_0 libpng 1.6.37 h2a8f88b_0 libsodium 1.0.18 h62dcd97_0 libssh2 1.9.0 h7a1dbc1_1 libtiff 4.1.0 h56a325e_1 libwebp-base 1.1.0 he774522_3 libxml2 2.9.10 h464c3ec_1 libxslt 1.1.34 he774522_0 libzopfli 1.0.3 ha925a31_0 llvmlite 0.31.0 py37ha925a31_0 locket 0.2.0 py37_1 lz4-c 1.9.2 h62dcd97_1 m2w64-gcc-libgfortran 5.3.0 6 m2w64-gcc-libs 5.3.0 7 m2w64-gcc-libs-core 5.3.0 7 m2w64-gmp 6.1.0 2 m2w64-libwinpthread-git 5.0.0.4634.697f757 2 markdown 3.2.2 py37_0 markupsafe 1.1.1 py37hfa6e2cd_1 matplotlib 3.1.3 py37_0 matplotlib-base 3.1.3 py37h64f37c6_0 menuinst 1.4.16 py37he774522_1 mistune 0.8.4 py37hfa6e2cd_1001 mkl 2019.4 245 mkl-service 2.3.0 py37hb782905_0 mkl_fft 1.1.0 py37h45dec08_0 mkl_random 1.1.0 py37h675688f_0 monotonic 1.5 py_0 msgpack-python 1.0.0 py37h74a9793_1 msys2-conda-epoch 20160418 1 multipledispatch 0.6.0 py37_0 napari 0.3.6 py_0 conda-forge napari-plugin-engine 0.1.6 py_0 conda-forge napari-svg 0.1.3 py_0 conda-forge nbconvert 5.6.1 py37_1 nbformat 5.0.7 py_0 netcdf4 1.5.3 py37h012c1a0_0 networkx 2.4 py_1 ninja 1.10.0 py37h7ef1ec2_0 notebook 6.0.3 py37hc8dfbb8_1 conda-forge numba 0.48.0 py37h47e9c7a_0 numcodecs 0.6.4 py37ha925a31_0 numpy 1.18.1 py37h93ca92e_0 numpy-base 1.18.1 py37hc3f5095_1 numpydoc 1.1.0 py_0 olefile 0.46 py37_0 openjpeg 2.3.1 h57dd2e7_3 conda-forge openssl 1.1.1g he774522_1 packaging 20.4 py_0 pandas 1.0.3 py37h47e9c7a_0 pandoc 2.10.1 0 pandocfilters 1.4.2 py37_1 panel 0.9.5 py_0 pyviz param 1.9.3 py_0 parso 0.7.0 pyh9f0ad1d_0 conda-forge partd 1.1.0 py_0 pickleshare 0.7.5 py37_1001 pillow 7.2.0 py37hcc1f983_0 pims 0.5 py_1 pip 20.1.1 py_1 conda-forge pluggy 0.13.1 py37_0 prometheus_client 0.8.0 py_0 prompt-toolkit 3.0.5 py_0 psutil 5.7.0 py37he774522_0 pycosat 0.6.3 py37he774522_0 pycparser 2.20 py_2 pyct 0.4.6 py37_0 pygments 2.6.1 py_0 pyopengl 3.1.1a1 py37_0 pyopenssl 19.1.0 py_1 pyparsing 2.4.7 py_0 pyqt 5.12.3 py37h1834ac0_3 conda-forge pyqt5-sip 4.19.18 pypi_0 pypi pyqtchart 5.12 pypi_0 pypi pyqtwebengine 5.12.1 pypi_0 pypi pyrsistent 0.16.0 py37he774522_0 pyside2 5.13.2 py37hfa7ce6d_2 conda-forge pysocks 1.7.1 py37_1 python 3.7.7 h81c818b_4 python-dateutil 2.8.1 py_0 python-lmdb 0.96 py37h6538335_0 conda-forge python-snappy 0.5.4 py37ha925a31_0 python_abi 3.7 1_cp37m conda-forge pytorch 1.2.0 py3.7_cuda100_cudnn7_1 pytorch pytz 2020.1 py_0 pyviz_comms 0.7.6 py_0 pywavelets 1.1.1 py37he774522_0 pywin32 227 py37he774522_1 pywinpty 0.5.7 py37_0 pyyaml 5.3.1 py37he774522_1 pyzmq 19.0.1 py37ha925a31_1 qt 5.12.5 h7ef1ec2_0 conda-forge qtconsole 4.7.5 py_0 qtpy 1.9.0 py_0 requests 2.24.0 py_0 rise 5.6.1 py37_1 ruamel_yaml 0.15.87 py37he774522_1 scikit-image 0.17.2 py37h3bbf574_1 conda-forge scikit-learn 0.23.1 py37h25d0782_0 scipy 1.5.0 py37h9439919_0 send2trash 1.5.0 py37_0 setuptools 49.6.0 py37_0 six 1.15.0 py_0 slicerator 1.0.0 py_0 snappy 1.1.8 h33f27b4_0 snowballstemmer 2.0.0 py_0 sortedcontainers 2.2.2 py_0 sphinx 3.2.1 py_0 sphinxcontrib-applehelp 1.0.2 py_0 sphinxcontrib-devhelp 1.0.2 py_0 sphinxcontrib-htmlhelp 1.0.3 py_0 sphinxcontrib-jsmath 1.0.1 py_0 sphinxcontrib-qthelp 1.0.3 py_0 sphinxcontrib-serializinghtml 1.1.4 py_0 sqlite 3.32.3 h2a8f88b_0 tbb 2020.0 h74a9793_0 tblib 1.6.0 py_0 terminado 0.8.3 py37_0 testpath 0.4.4 py_0 threadpoolctl 2.1.0 pyh5ca1d4c_0 thrift 0.13.0 py37ha925a31_0 tifffile 2020.7.22 py_0 conda-forge tk 8.6.10 he774522_0 toolz 0.10.0 py_0 torchvision 0.4.0 py37_cu100 pytorch tornado 6.0.4 py37he774522_1 tqdm 4.48.2 py_0 traitlets 4.3.3 py37_0 traittypes 0.2.1 py_1 conda-forge typing_extensions 3.7.4.2 py_0 urllib3 1.25.10 py_0 vc 14.1 h0510ff6_4 vispy 0.6.4 py37hbc2f12b_1 conda-forge vs2015_runtime 14.16.27012 hf0eaf9b_3 wcwidth 0.2.5 py_0 webencodings 0.5.1 py37_1 wheel 0.34.2 py37_0 widgetsnbextension 3.5.1 py37_0 win_inet_pton 1.1.0 py37_0 wincertstore 0.2 py37_0 winpty 0.4.3 4 wrapt 1.12.1 py37he774522_1 xarray 0.15.1 py_0 xz 5.2.5 h62dcd97_0 yaml 0.2.5 he774522_0 zarr 2.4.0 py_0 conda-forge zeromq 4.3.2 ha925a31_2 zfp 0.5.5 ha925a31_1 conda-forge zict 2.0.0 py_0 zipp 3.1.0 py_0 zlib 1.2.11 h62dcd97_4 zstandard 0.13.0 py37ha925a31_0 zstd 1.4.5 h04227a9_0

mrocklin commented 4 years ago

Correct. Libraries like scikit-image do not automatically parallelize all of their functions with Dask. Typically people combine Dask array with scikit-image using methods like map_blocks or map_overlap. You may also want to take a look at the dask-image project.

On Sat, Aug 22, 2020 at 7:16 AM Sh4zKh4n notifications@github.com wrote:

Hi,

When working with a large data but havent been able to get dask to work with bigger than ram data with other libraries. I am unable to get any libraries like Scikit image to not call or compute immediately. It is if calling then persists to memory. I just get stuck in a loop that is either memory issue or I get stuck with a loop with my hard drive reading and writing but not to a folder. These should all be futures or delayed functions until I call them? shouldnt they? This is a problem even if I use map_blocks or Map overlay.

I can run pixel manipulations, 40GB or 300GB files, quite easily and having a super fast contrast stretch, dask histogram works when I chunk around 50MB, that I manually built with Ufunc is great. It just means all the examples that Ive read around just never work. I have even cleaned my computer once and it doesnt work. The below code has now been running for 20 mins without any changes just my hard drive writing something and nothing else. Is there some global state that can turn of libraries from persisting to memory unless compute is called or saved to file? Ill add my enviroment details soon. Example is below:

import numpy as np import scipy import scipy.ndimage import matplotlib import matplotlib.pyplot as plt %matplotlib inline import dask import dask.array as da from dask.diagnostics import ProgressBar import os import zarr pbar = ProgressBar() pbar.register()

plt.rcParams["figure.figsize"] = (10,10) import gc gc.collect()

da_Colz = da.from_zarr('E:\Col_Zarr_CS\mask\med.zarr') da_Colz

image=da_Colz

from skimage.transform import rescale

image_rescaled = rescale(image, 0.25, preserve_range=True ,anti_aliasing=True, )

[image: image] https://user-images.githubusercontent.com/49095152/90957977-fe382900-e488-11ea-8b5a-ab92630ce8a0.png

Environment:

  • Dask version: see enviroment below
  • Python version: 3.77
  • Operating System: windows 10 64bit
  • Install method Conda enviroment:

Name Version Build Channel

_pytorch_select 0.1 cpu_0 alabaster 0.7.12 py37_0 asciitree 0.3.3 py_2 attrs 19.3.0 py_0 babel 2.8.0 py_0 backcall 0.2.0 py_0 blas 1.0 mkl bleach 3.1.5 py_0 blosc 1.19.0 h7bd577a_0 bokeh 2.0.2 py37_0 brotli 1.0.7 h33f27b4_0 brotlipy 0.7.0 py37he774522_1000 bzip2 1.0.8 he774522_0 ca-certificates 2020.6.24 0 cachey 0.2.1 pyh9f0ad1d_0 conda-forge certifi 2020.6.20 py37_0 cffi 1.14.0 py37h7a1dbc1_0 cftime 1.2.1 py37h2a96729_0 chardet 3.0.4 py37_1003 charls 2.1.0 h33f27b4_2 click 7.1.2 py_0 cloudpickle 1.5.0 py_0 colorama 0.4.3 py_0 colorcet 2.0.2 py_0 conda 4.8.4 py37_0 conda-package-handling 1.6.1 py37h62dcd97_0 cryptography 2.9.2 py37h7a1dbc1_0 cudatoolkit 10.0.130 0 cudnn 7.6.5 cuda10.0_0 cupy 6.0.0 py37h230ac6f_0 curl 7.71.1 h2a8f88b_1 cycler 0.10.0 py37_0 cython 0.29.21 py37h1834ac0_0 conda-forge cytoolz 0.10.1 py37he774522_0 dask 2.17.2 py_0 dask-core 2.17.2 py_0 dask-image 0.3.0 pyh9f0ad1d_0 conda-forge datashader 0.11.1 py_0 datashape 0.5.4 py37_1 decorator 4.4.2 py_0 defusedxml 0.6.0 py_0 distributed 2.23.0 py37_0 docutils 0.16 py37_1 entrypoints 0.3 py37_0 fasteners 0.15 py_0 fastparquet 0.3.2 py37h2a96729_0 fastrlock 0.4 py37h6538335_0 freetype 2.10.2 hd328e21_0 freetype-py 2.2.0 pyh9f0ad1d_0 conda-forge fsspec 0.7.4 py_0 giflib 5.2.1 h2fa13f4_2 conda-forge hdf4 4.2.13 h712560f_2 hdf5 1.10.4 h7ebc959_0 heapdict 1.0.1 py_0 holoviews 1.13.3 py_0 holoviz 0.11.6 py_0 pyviz hvplot 0.6.0 py_1 icc_rt 2019.0.0 h0cc432a_1 icu 64.2 he025d50_1 conda-forge idna 2.10 py_0 imagecodecs 2020.5.30 py37h92c78e3_2 conda-forge imageio 2.9.0 py_0 imagesize 1.2.0 py_0 importlib-metadata 1.7.0 py37_0 importlib_metadata 1.7.0 0 intel-openmp 2019.4 245 ipydatawidgets 4.0.1 pyh9f0ad1d_1 conda-forge ipykernel 5.3.3 py37h5ca1d4c_0 ipympl 0.3.3 py_0 ipython 7.17.0 py37h39e3cac_0 ipython_genutils 0.2.0 py37_0 ipywidgets 7.5.1 py_0 itk 5.1.0 py37hc8dfbb8_2 conda-forge itk-core 5.1.0.post3 pypi_0 pypi itk-filtering 5.1.0.post3 pypi_0 pypi itk-numerics 5.1.0.post3 pypi_0 pypi itk-registration 5.1.0.post3 pypi_0 pypi itkwidgets 0.31.4 py37hc8dfbb8_0 conda-forge jedi 0.17.2 py37_0 jinja2 2.11.2 py_0 joblib 0.16.0 py_0 jpeg 9d he774522_0 conda-forge json5 0.9.5 py_0 jsonschema 3.2.0 py37_1 jupyter_client 6.1.6 py_0 jupyter_core 4.6.3 py37_0 jupyterlab 2.2.0 py_0 conda-forge jupyterlab_server 1.2.0 py_0 jxrlib 1.1 he774522_2 kiwisolver 1.2.0 py37h74a9793_0 krb5 1.18.2 hc04afaa_0 lcms2 2.11 hc51a39a_0 lerc 2.2 ha925a31_0 conda-forge libaec 1.0.4 h33f27b4_1 libblas 3.8.0 14_mkl conda-forge libcblas 3.8.0 14_mkl conda-forge libclang 9.0.1 default_hf44288c_0 libcurl 7.71.1 h2a8f88b_1 libiconv 1.15 h1df5818_7 libmklml 2019.0.5 0 libnetcdf 4.7.3 h1302dcc_0 libpng 1.6.37 h2a8f88b_0 libsodium 1.0.18 h62dcd97_0 libssh2 1.9.0 h7a1dbc1_1 libtiff 4.1.0 h56a325e_1 libwebp-base 1.1.0 he774522_3 libxml2 2.9.10 h464c3ec_1 libxslt 1.1.34 he774522_0 libzopfli 1.0.3 ha925a31_0 llvmlite 0.31.0 py37ha925a31_0 locket 0.2.0 py37_1 lz4-c 1.9.2 h62dcd97_1 m2w64-gcc-libgfortran 5.3.0 6 m2w64-gcc-libs 5.3.0 7 m2w64-gcc-libs-core 5.3.0 7 m2w64-gmp 6.1.0 2 m2w64-libwinpthread-git 5.0.0.4634.697f757 2 markdown 3.2.2 py37_0 markupsafe 1.1.1 py37hfa6e2cd_1 matplotlib 3.1.3 py37_0 matplotlib-base 3.1.3 py37h64f37c6_0 menuinst 1.4.16 py37he774522_1 mistune 0.8.4 py37hfa6e2cd_1001 mkl 2019.4 245 mkl-service 2.3.0 py37hb782905_0 mkl_fft 1.1.0 py37h45dec08_0 mkl_random 1.1.0 py37h675688f_0 monotonic 1.5 py_0 msgpack-python 1.0.0 py37h74a9793_1 msys2-conda-epoch 20160418 1 multipledispatch 0.6.0 py37_0 napari 0.3.6 py_0 conda-forge napari-plugin-engine 0.1.6 py_0 conda-forge napari-svg 0.1.3 py_0 conda-forge nbconvert 5.6.1 py37_1 nbformat 5.0.7 py_0 netcdf4 1.5.3 py37h012c1a0_0 networkx 2.4 py_1 ninja 1.10.0 py37h7ef1ec2_0 notebook 6.0.3 py37hc8dfbb8_1 conda-forge numba 0.48.0 py37h47e9c7a_0 numcodecs 0.6.4 py37ha925a31_0 numpy 1.18.1 py37h93ca92e_0 numpy-base 1.18.1 py37hc3f5095_1 numpydoc 1.1.0 py_0 olefile 0.46 py37_0 openjpeg 2.3.1 h57dd2e7_3 conda-forge openssl 1.1.1g he774522_1 packaging 20.4 py_0 pandas 1.0.3 py37h47e9c7a_0 pandoc 2.10.1 0 pandocfilters 1.4.2 py37_1 panel 0.9.5 py_0 pyviz param 1.9.3 py_0 parso 0.7.0 pyh9f0ad1d_0 conda-forge partd 1.1.0 py_0 pickleshare 0.7.5 py37_1001 pillow 7.2.0 py37hcc1f983_0 pims 0.5 py_1 pip 20.1.1 py_1 conda-forge pluggy 0.13.1 py37_0 prometheus_client 0.8.0 py_0 prompt-toolkit 3.0.5 py_0 psutil 5.7.0 py37he774522_0 pycosat 0.6.3 py37he774522_0 pycparser 2.20 py_2 pyct 0.4.6 py37_0 pygments 2.6.1 py_0 pyopengl 3.1.1a1 py37_0 pyopenssl 19.1.0 py_1 pyparsing 2.4.7 py_0 pyqt 5.12.3 py37h1834ac0_3 conda-forge pyqt5-sip 4.19.18 pypi_0 pypi pyqtchart 5.12 pypi_0 pypi pyqtwebengine 5.12.1 pypi_0 pypi pyrsistent 0.16.0 py37he774522_0 pyside2 5.13.2 py37hfa7ce6d_2 conda-forge pysocks 1.7.1 py37_1 python 3.7.7 h81c818b_4 python-dateutil 2.8.1 py_0 python-lmdb 0.96 py37h6538335_0 conda-forge python-snappy 0.5.4 py37ha925a31_0 python_abi 3.7 1_cp37m conda-forge pytorch 1.2.0 py3.7_cuda100_cudnn7_1 pytorch pytz 2020.1 py_0 pyviz_comms 0.7.6 py_0 pywavelets 1.1.1 py37he774522_0 pywin32 227 py37he774522_1 pywinpty 0.5.7 py37_0 pyyaml 5.3.1 py37he774522_1 pyzmq 19.0.1 py37ha925a31_1 qt 5.12.5 h7ef1ec2_0 conda-forge qtconsole 4.7.5 py_0 qtpy 1.9.0 py_0 requests 2.24.0 py_0 rise 5.6.1 py37_1 ruamel_yaml 0.15.87 py37he774522_1 scikit-image 0.17.2 py37h3bbf574_1 conda-forge scikit-learn 0.23.1 py37h25d0782_0 scipy 1.5.0 py37h9439919_0 send2trash 1.5.0 py37_0 setuptools 49.6.0 py37_0 six 1.15.0 py_0 slicerator 1.0.0 py_0 snappy 1.1.8 h33f27b4_0 snowballstemmer 2.0.0 py_0 sortedcontainers 2.2.2 py_0 sphinx 3.2.1 py_0 sphinxcontrib-applehelp 1.0.2 py_0 sphinxcontrib-devhelp 1.0.2 py_0 sphinxcontrib-htmlhelp 1.0.3 py_0 sphinxcontrib-jsmath 1.0.1 py_0 sphinxcontrib-qthelp 1.0.3 py_0 sphinxcontrib-serializinghtml 1.1.4 py_0 sqlite 3.32.3 h2a8f88b_0 tbb 2020.0 h74a9793_0 tblib 1.6.0 py_0 terminado 0.8.3 py37_0 testpath 0.4.4 py_0 threadpoolctl 2.1.0 pyh5ca1d4c_0 thrift 0.13.0 py37ha925a31_0 tifffile 2020.7.22 py_0 conda-forge tk 8.6.10 he774522_0 toolz 0.10.0 py_0 torchvision 0.4.0 py37_cu100 pytorch tornado 6.0.4 py37he774522_1 tqdm 4.48.2 py_0 traitlets 4.3.3 py37_0 traittypes 0.2.1 py_1 conda-forge typing_extensions 3.7.4.2 py_0 urllib3 1.25.10 py_0 vc 14.1 h0510ff6_4 vispy 0.6.4 py37hbc2f12b_1 conda-forge vs2015_runtime 14.16.27012 hf0eaf9b_3 wcwidth 0.2.5 py_0 webencodings 0.5.1 py37_1 wheel 0.34.2 py37_0 widgetsnbextension 3.5.1 py37_0 win_inet_pton 1.1.0 py37_0 wincertstore 0.2 py37_0 winpty 0.4.3 4 wrapt 1.12.1 py37he774522_1 xarray 0.15.1 py_0 xz 5.2.5 h62dcd97_0 yaml 0.2.5 he774522_0 zarr 2.4.0 py_0 conda-forge zeromq 4.3.2 ha925a31_2 zfp 0.5.5 ha925a31_1 conda-forge zict 2.0.0 py_0 zipp 3.1.0 py_0 zlib 1.2.11 h62dcd97_4 zstandard 0.13.0 py37ha925a31_0 zstd 1.4.5 h04227a9_0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/6542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTF4ZU6ZTI5XOB46XE3SB7HK3ANCNFSM4QIDWSTQ .

Sh4zKh4n commented 4 years ago

So @mrocklin I have tried those exact methods and the same problem. Just gets stuck. I'm wondering if there's some global setting that's messed up. I've tried map blocks and map overlap and it's just loads up everything and gets stuck or has a memory issue. I'll try a little later to show the effect. Is there a global function that might be an issue that causes this issue. I've followed your blog instructions, can't repeat. Tried Dask image and can't, tried to copy the similar method to naparis instruction. Can't repeat. Everytime a slightly different issue of dtype declaration, or can't fit a 20MB chunk into memory or just gets stuck. Kind of at a loss what to try. I found chunking below 100mb is actually perfect for unfunc. Map_blocks or map_overlay just pulls a memory error. It opens a zarr file and then doesn't write a thing to it.

mrocklin commented 4 years ago

I'm sorry to hear that you're having a tough time.

I recommend providing an example of map_blocks or map_overlap failing.

On Sat, Aug 22, 2020 at 7:16 AM Sh4zKh4n notifications@github.com wrote:

Hi,

When working with a large data but havent been able to get dask to work with bigger than ram data with other libraries. I am unable to get any libraries like Scikit image to not call or compute immediately. It is if calling then persists to memory. I just get stuck in a loop that is either memory issue or I get stuck with a loop with my hard drive reading and writing but not to a folder. These should all be futures or delayed functions until I call them? shouldnt they? This is a problem even if I use map_blocks or Map overlay.

I can run pixel manipulations, 40GB or 300GB files, quite easily and having a super fast contrast stretch, dask histogram works when I chunk around 50MB, that I manually built with Ufunc is great. It just means all the examples that Ive read around just never work. I have even cleaned my computer once and it doesnt work. The below code has now been running for 20 mins without any changes just my hard drive writing something and nothing else. Is there some global state that can turn of libraries from persisting to memory unless compute is called or saved to file? Ill add my enviroment details soon. Example is below:

import numpy as np import scipy import scipy.ndimage import matplotlib import matplotlib.pyplot as plt %matplotlib inline import dask import dask.array as da from dask.diagnostics import ProgressBar import os import zarr pbar = ProgressBar() pbar.register()

plt.rcParams["figure.figsize"] = (10,10) import gc gc.collect()

da_Colz = da.from_zarr('E:\Col_Zarr_CS\mask\med.zarr') da_Colz

image=da_Colz

from skimage.transform import rescale

image_rescaled = rescale(image, 0.25, preserve_range=True ,anti_aliasing=True, )

[image: image] https://user-images.githubusercontent.com/49095152/90957977-fe382900-e488-11ea-8b5a-ab92630ce8a0.png

Environment:

  • Dask version: see enviroment below
  • Python version: 3.77
  • Operating System: windows 10 64bit
  • Install method Conda enviroment:

Name Version Build Channel

_pytorch_select 0.1 cpu_0 alabaster 0.7.12 py37_0 asciitree 0.3.3 py_2 attrs 19.3.0 py_0 babel 2.8.0 py_0 backcall 0.2.0 py_0 blas 1.0 mkl bleach 3.1.5 py_0 blosc 1.19.0 h7bd577a_0 bokeh 2.0.2 py37_0 brotli 1.0.7 h33f27b4_0 brotlipy 0.7.0 py37he774522_1000 bzip2 1.0.8 he774522_0 ca-certificates 2020.6.24 0 cachey 0.2.1 pyh9f0ad1d_0 conda-forge certifi 2020.6.20 py37_0 cffi 1.14.0 py37h7a1dbc1_0 cftime 1.2.1 py37h2a96729_0 chardet 3.0.4 py37_1003 charls 2.1.0 h33f27b4_2 click 7.1.2 py_0 cloudpickle 1.5.0 py_0 colorama 0.4.3 py_0 colorcet 2.0.2 py_0 conda 4.8.4 py37_0 conda-package-handling 1.6.1 py37h62dcd97_0 cryptography 2.9.2 py37h7a1dbc1_0 cudatoolkit 10.0.130 0 cudnn 7.6.5 cuda10.0_0 cupy 6.0.0 py37h230ac6f_0 curl 7.71.1 h2a8f88b_1 cycler 0.10.0 py37_0 cython 0.29.21 py37h1834ac0_0 conda-forge cytoolz 0.10.1 py37he774522_0 dask 2.17.2 py_0 dask-core 2.17.2 py_0 dask-image 0.3.0 pyh9f0ad1d_0 conda-forge datashader 0.11.1 py_0 datashape 0.5.4 py37_1 decorator 4.4.2 py_0 defusedxml 0.6.0 py_0 distributed 2.23.0 py37_0 docutils 0.16 py37_1 entrypoints 0.3 py37_0 fasteners 0.15 py_0 fastparquet 0.3.2 py37h2a96729_0 fastrlock 0.4 py37h6538335_0 freetype 2.10.2 hd328e21_0 freetype-py 2.2.0 pyh9f0ad1d_0 conda-forge fsspec 0.7.4 py_0 giflib 5.2.1 h2fa13f4_2 conda-forge hdf4 4.2.13 h712560f_2 hdf5 1.10.4 h7ebc959_0 heapdict 1.0.1 py_0 holoviews 1.13.3 py_0 holoviz 0.11.6 py_0 pyviz hvplot 0.6.0 py_1 icc_rt 2019.0.0 h0cc432a_1 icu 64.2 he025d50_1 conda-forge idna 2.10 py_0 imagecodecs 2020.5.30 py37h92c78e3_2 conda-forge imageio 2.9.0 py_0 imagesize 1.2.0 py_0 importlib-metadata 1.7.0 py37_0 importlib_metadata 1.7.0 0 intel-openmp 2019.4 245 ipydatawidgets 4.0.1 pyh9f0ad1d_1 conda-forge ipykernel 5.3.3 py37h5ca1d4c_0 ipympl 0.3.3 py_0 ipython 7.17.0 py37h39e3cac_0 ipython_genutils 0.2.0 py37_0 ipywidgets 7.5.1 py_0 itk 5.1.0 py37hc8dfbb8_2 conda-forge itk-core 5.1.0.post3 pypi_0 pypi itk-filtering 5.1.0.post3 pypi_0 pypi itk-numerics 5.1.0.post3 pypi_0 pypi itk-registration 5.1.0.post3 pypi_0 pypi itkwidgets 0.31.4 py37hc8dfbb8_0 conda-forge jedi 0.17.2 py37_0 jinja2 2.11.2 py_0 joblib 0.16.0 py_0 jpeg 9d he774522_0 conda-forge json5 0.9.5 py_0 jsonschema 3.2.0 py37_1 jupyter_client 6.1.6 py_0 jupyter_core 4.6.3 py37_0 jupyterlab 2.2.0 py_0 conda-forge jupyterlab_server 1.2.0 py_0 jxrlib 1.1 he774522_2 kiwisolver 1.2.0 py37h74a9793_0 krb5 1.18.2 hc04afaa_0 lcms2 2.11 hc51a39a_0 lerc 2.2 ha925a31_0 conda-forge libaec 1.0.4 h33f27b4_1 libblas 3.8.0 14_mkl conda-forge libcblas 3.8.0 14_mkl conda-forge libclang 9.0.1 default_hf44288c_0 libcurl 7.71.1 h2a8f88b_1 libiconv 1.15 h1df5818_7 libmklml 2019.0.5 0 libnetcdf 4.7.3 h1302dcc_0 libpng 1.6.37 h2a8f88b_0 libsodium 1.0.18 h62dcd97_0 libssh2 1.9.0 h7a1dbc1_1 libtiff 4.1.0 h56a325e_1 libwebp-base 1.1.0 he774522_3 libxml2 2.9.10 h464c3ec_1 libxslt 1.1.34 he774522_0 libzopfli 1.0.3 ha925a31_0 llvmlite 0.31.0 py37ha925a31_0 locket 0.2.0 py37_1 lz4-c 1.9.2 h62dcd97_1 m2w64-gcc-libgfortran 5.3.0 6 m2w64-gcc-libs 5.3.0 7 m2w64-gcc-libs-core 5.3.0 7 m2w64-gmp 6.1.0 2 m2w64-libwinpthread-git 5.0.0.4634.697f757 2 markdown 3.2.2 py37_0 markupsafe 1.1.1 py37hfa6e2cd_1 matplotlib 3.1.3 py37_0 matplotlib-base 3.1.3 py37h64f37c6_0 menuinst 1.4.16 py37he774522_1 mistune 0.8.4 py37hfa6e2cd_1001 mkl 2019.4 245 mkl-service 2.3.0 py37hb782905_0 mkl_fft 1.1.0 py37h45dec08_0 mkl_random 1.1.0 py37h675688f_0 monotonic 1.5 py_0 msgpack-python 1.0.0 py37h74a9793_1 msys2-conda-epoch 20160418 1 multipledispatch 0.6.0 py37_0 napari 0.3.6 py_0 conda-forge napari-plugin-engine 0.1.6 py_0 conda-forge napari-svg 0.1.3 py_0 conda-forge nbconvert 5.6.1 py37_1 nbformat 5.0.7 py_0 netcdf4 1.5.3 py37h012c1a0_0 networkx 2.4 py_1 ninja 1.10.0 py37h7ef1ec2_0 notebook 6.0.3 py37hc8dfbb8_1 conda-forge numba 0.48.0 py37h47e9c7a_0 numcodecs 0.6.4 py37ha925a31_0 numpy 1.18.1 py37h93ca92e_0 numpy-base 1.18.1 py37hc3f5095_1 numpydoc 1.1.0 py_0 olefile 0.46 py37_0 openjpeg 2.3.1 h57dd2e7_3 conda-forge openssl 1.1.1g he774522_1 packaging 20.4 py_0 pandas 1.0.3 py37h47e9c7a_0 pandoc 2.10.1 0 pandocfilters 1.4.2 py37_1 panel 0.9.5 py_0 pyviz param 1.9.3 py_0 parso 0.7.0 pyh9f0ad1d_0 conda-forge partd 1.1.0 py_0 pickleshare 0.7.5 py37_1001 pillow 7.2.0 py37hcc1f983_0 pims 0.5 py_1 pip 20.1.1 py_1 conda-forge pluggy 0.13.1 py37_0 prometheus_client 0.8.0 py_0 prompt-toolkit 3.0.5 py_0 psutil 5.7.0 py37he774522_0 pycosat 0.6.3 py37he774522_0 pycparser 2.20 py_2 pyct 0.4.6 py37_0 pygments 2.6.1 py_0 pyopengl 3.1.1a1 py37_0 pyopenssl 19.1.0 py_1 pyparsing 2.4.7 py_0 pyqt 5.12.3 py37h1834ac0_3 conda-forge pyqt5-sip 4.19.18 pypi_0 pypi pyqtchart 5.12 pypi_0 pypi pyqtwebengine 5.12.1 pypi_0 pypi pyrsistent 0.16.0 py37he774522_0 pyside2 5.13.2 py37hfa7ce6d_2 conda-forge pysocks 1.7.1 py37_1 python 3.7.7 h81c818b_4 python-dateutil 2.8.1 py_0 python-lmdb 0.96 py37h6538335_0 conda-forge python-snappy 0.5.4 py37ha925a31_0 python_abi 3.7 1_cp37m conda-forge pytorch 1.2.0 py3.7_cuda100_cudnn7_1 pytorch pytz 2020.1 py_0 pyviz_comms 0.7.6 py_0 pywavelets 1.1.1 py37he774522_0 pywin32 227 py37he774522_1 pywinpty 0.5.7 py37_0 pyyaml 5.3.1 py37he774522_1 pyzmq 19.0.1 py37ha925a31_1 qt 5.12.5 h7ef1ec2_0 conda-forge qtconsole 4.7.5 py_0 qtpy 1.9.0 py_0 requests 2.24.0 py_0 rise 5.6.1 py37_1 ruamel_yaml 0.15.87 py37he774522_1 scikit-image 0.17.2 py37h3bbf574_1 conda-forge scikit-learn 0.23.1 py37h25d0782_0 scipy 1.5.0 py37h9439919_0 send2trash 1.5.0 py37_0 setuptools 49.6.0 py37_0 six 1.15.0 py_0 slicerator 1.0.0 py_0 snappy 1.1.8 h33f27b4_0 snowballstemmer 2.0.0 py_0 sortedcontainers 2.2.2 py_0 sphinx 3.2.1 py_0 sphinxcontrib-applehelp 1.0.2 py_0 sphinxcontrib-devhelp 1.0.2 py_0 sphinxcontrib-htmlhelp 1.0.3 py_0 sphinxcontrib-jsmath 1.0.1 py_0 sphinxcontrib-qthelp 1.0.3 py_0 sphinxcontrib-serializinghtml 1.1.4 py_0 sqlite 3.32.3 h2a8f88b_0 tbb 2020.0 h74a9793_0 tblib 1.6.0 py_0 terminado 0.8.3 py37_0 testpath 0.4.4 py_0 threadpoolctl 2.1.0 pyh5ca1d4c_0 thrift 0.13.0 py37ha925a31_0 tifffile 2020.7.22 py_0 conda-forge tk 8.6.10 he774522_0 toolz 0.10.0 py_0 torchvision 0.4.0 py37_cu100 pytorch tornado 6.0.4 py37he774522_1 tqdm 4.48.2 py_0 traitlets 4.3.3 py37_0 traittypes 0.2.1 py_1 conda-forge typing_extensions 3.7.4.2 py_0 urllib3 1.25.10 py_0 vc 14.1 h0510ff6_4 vispy 0.6.4 py37hbc2f12b_1 conda-forge vs2015_runtime 14.16.27012 hf0eaf9b_3 wcwidth 0.2.5 py_0 webencodings 0.5.1 py37_1 wheel 0.34.2 py37_0 widgetsnbextension 3.5.1 py37_0 win_inet_pton 1.1.0 py37_0 wincertstore 0.2 py37_0 winpty 0.4.3 4 wrapt 1.12.1 py37he774522_1 xarray 0.15.1 py_0 xz 5.2.5 h62dcd97_0 yaml 0.2.5 he774522_0 zarr 2.4.0 py_0 conda-forge zeromq 4.3.2 ha925a31_2 zfp 0.5.5 ha925a31_1 conda-forge zict 2.0.0 py_0 zipp 3.1.0 py_0 zlib 1.2.11 h62dcd97_4 zstandard 0.13.0 py37ha925a31_0 zstd 1.4.5 h04227a9_0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/6542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTF4ZU6ZTI5XOB46XE3SB7HK3ANCNFSM4QIDWSTQ .

Sh4zKh4n commented 4 years ago

I'm running the computation right now while I make dinner for the family. I've really been working on this for months and though element wise I can get something going. I can't even rescale with interpolation to try and process and create masks that can be rescaled up to reduce the data size. I'm genuinely trying.

I do though appreciate your time and efforts.

Get Outlook for Androidhttps://aka.ms/ghei36


From: Matthew Rocklin notifications@github.com Sent: Saturday, August 22, 2020 4:30:40 PM To: dask/dask dask@noreply.github.com Cc: Shazada Khan shazada.khan@qmul.ac.uk; Author author@noreply.github.com Subject: Re: [dask/dask] Calling Scikit image or ndimage to dask array just creates a never ending function and computes without actually calling compute. (#6542)

I'm sorry to hear that you're having a tough time.

I recommend providing an example of map_blocks or map_overlap failing.

On Sat, Aug 22, 2020 at 7:16 AM Sh4zKh4n notifications@github.com wrote:

Hi,

When working with a large data but havent been able to get dask to work with bigger than ram data with other libraries. I am unable to get any libraries like Scikit image to not call or compute immediately. It is if calling then persists to memory. I just get stuck in a loop that is either memory issue or I get stuck with a loop with my hard drive reading and writing but not to a folder. These should all be futures or delayed functions until I call them? shouldnt they? This is a problem even if I use map_blocks or Map overlay.

I can run pixel manipulations, 40GB or 300GB files, quite easily and having a super fast contrast stretch, dask histogram works when I chunk around 50MB, that I manually built with Ufunc is great. It just means all the examples that Ive read around just never work. I have even cleaned my computer once and it doesnt work. The below code has now been running for 20 mins without any changes just my hard drive writing something and nothing else. Is there some global state that can turn of libraries from persisting to memory unless compute is called or saved to file? Ill add my enviroment details soon. Example is below:

import numpy as np import scipy import scipy.ndimage import matplotlib import matplotlib.pyplot as plt %matplotlib inline import dask import dask.array as da from dask.diagnostics import ProgressBar import os import zarr pbar = ProgressBar() pbar.register()

plt.rcParams["figure.figsize"] = (10,10) import gc gc.collect()

da_Colz = da.from_zarr('E:\Col_Zarr_CS\mask\med.zarr') da_Colz

image=da_Colz

from skimage.transform import rescale

image_rescaled = rescale(image, 0.25, preserve_range=True ,anti_aliasing=True, )

[image: image] https://user-images.githubusercontent.com/49095152/90957977-fe382900-e488-11ea-8b5a-ab92630ce8a0.png

Environment:

  • Dask version: see enviroment below
  • Python version: 3.77
  • Operating System: windows 10 64bit
  • Install method Conda enviroment:

Name Version Build Channel

_pytorch_select 0.1 cpu_0 alabaster 0.7.12 py37_0 asciitree 0.3.3 py_2 attrs 19.3.0 py_0 babel 2.8.0 py_0 backcall 0.2.0 py_0 blas 1.0 mkl bleach 3.1.5 py_0 blosc 1.19.0 h7bd577a_0 bokeh 2.0.2 py37_0 brotli 1.0.7 h33f27b4_0 brotlipy 0.7.0 py37he774522_1000 bzip2 1.0.8 he774522_0 ca-certificates 2020.6.24 0 cachey 0.2.1 pyh9f0ad1d_0 conda-forge certifi 2020.6.20 py37_0 cffi 1.14.0 py37h7a1dbc1_0 cftime 1.2.1 py37h2a96729_0 chardet 3.0.4 py37_1003 charls 2.1.0 h33f27b4_2 click 7.1.2 py_0 cloudpickle 1.5.0 py_0 colorama 0.4.3 py_0 colorcet 2.0.2 py_0 conda 4.8.4 py37_0 conda-package-handling 1.6.1 py37h62dcd97_0 cryptography 2.9.2 py37h7a1dbc1_0 cudatoolkit 10.0.130 0 cudnn 7.6.5 cuda10.0_0 cupy 6.0.0 py37h230ac6f_0 curl 7.71.1 h2a8f88b_1 cycler 0.10.0 py37_0 cython 0.29.21 py37h1834ac0_0 conda-forge cytoolz 0.10.1 py37he774522_0 dask 2.17.2 py_0 dask-core 2.17.2 py_0 dask-image 0.3.0 pyh9f0ad1d_0 conda-forge datashader 0.11.1 py_0 datashape 0.5.4 py37_1 decorator 4.4.2 py_0 defusedxml 0.6.0 py_0 distributed 2.23.0 py37_0 docutils 0.16 py37_1 entrypoints 0.3 py37_0 fasteners 0.15 py_0 fastparquet 0.3.2 py37h2a96729_0 fastrlock 0.4 py37h6538335_0 freetype 2.10.2 hd328e21_0 freetype-py 2.2.0 pyh9f0ad1d_0 conda-forge fsspec 0.7.4 py_0 giflib 5.2.1 h2fa13f4_2 conda-forge hdf4 4.2.13 h712560f_2 hdf5 1.10.4 h7ebc959_0 heapdict 1.0.1 py_0 holoviews 1.13.3 py_0 holoviz 0.11.6 py_0 pyviz hvplot 0.6.0 py_1 icc_rt 2019.0.0 h0cc432a_1 icu 64.2 he025d50_1 conda-forge idna 2.10 py_0 imagecodecs 2020.5.30 py37h92c78e3_2 conda-forge imageio 2.9.0 py_0 imagesize 1.2.0 py_0 importlib-metadata 1.7.0 py37_0 importlib_metadata 1.7.0 0 intel-openmp 2019.4 245 ipydatawidgets 4.0.1 pyh9f0ad1d_1 conda-forge ipykernel 5.3.3 py37h5ca1d4c_0 ipympl 0.3.3 py_0 ipython 7.17.0 py37h39e3cac_0 ipython_genutils 0.2.0 py37_0 ipywidgets 7.5.1 py_0 itk 5.1.0 py37hc8dfbb8_2 conda-forge itk-core 5.1.0.post3 pypi_0 pypi itk-filtering 5.1.0.post3 pypi_0 pypi itk-numerics 5.1.0.post3 pypi_0 pypi itk-registration 5.1.0.post3 pypi_0 pypi itkwidgets 0.31.4 py37hc8dfbb8_0 conda-forge jedi 0.17.2 py37_0 jinja2 2.11.2 py_0 joblib 0.16.0 py_0 jpeg 9d he774522_0 conda-forge json5 0.9.5 py_0 jsonschema 3.2.0 py37_1 jupyter_client 6.1.6 py_0 jupyter_core 4.6.3 py37_0 jupyterlab 2.2.0 py_0 conda-forge jupyterlab_server 1.2.0 py_0 jxrlib 1.1 he774522_2 kiwisolver 1.2.0 py37h74a9793_0 krb5 1.18.2 hc04afaa_0 lcms2 2.11 hc51a39a_0 lerc 2.2 ha925a31_0 conda-forge libaec 1.0.4 h33f27b4_1 libblas 3.8.0 14_mkl conda-forge libcblas 3.8.0 14_mkl conda-forge libclang 9.0.1 default_hf44288c_0 libcurl 7.71.1 h2a8f88b_1 libiconv 1.15 h1df5818_7 libmklml 2019.0.5 0 libnetcdf 4.7.3 h1302dcc_0 libpng 1.6.37 h2a8f88b_0 libsodium 1.0.18 h62dcd97_0 libssh2 1.9.0 h7a1dbc1_1 libtiff 4.1.0 h56a325e_1 libwebp-base 1.1.0 he774522_3 libxml2 2.9.10 h464c3ec_1 libxslt 1.1.34 he774522_0 libzopfli 1.0.3 ha925a31_0 llvmlite 0.31.0 py37ha925a31_0 locket 0.2.0 py37_1 lz4-c 1.9.2 h62dcd97_1 m2w64-gcc-libgfortran 5.3.0 6 m2w64-gcc-libs 5.3.0 7 m2w64-gcc-libs-core 5.3.0 7 m2w64-gmp 6.1.0 2 m2w64-libwinpthread-git 5.0.0.4634.697f757 2 markdown 3.2.2 py37_0 markupsafe 1.1.1 py37hfa6e2cd_1 matplotlib 3.1.3 py37_0 matplotlib-base 3.1.3 py37h64f37c6_0 menuinst 1.4.16 py37he774522_1 mistune 0.8.4 py37hfa6e2cd_1001 mkl 2019.4 245 mkl-service 2.3.0 py37hb782905_0 mkl_fft 1.1.0 py37h45dec08_0 mkl_random 1.1.0 py37h675688f_0 monotonic 1.5 py_0 msgpack-python 1.0.0 py37h74a9793_1 msys2-conda-epoch 20160418 1 multipledispatch 0.6.0 py37_0 napari 0.3.6 py_0 conda-forge napari-plugin-engine 0.1.6 py_0 conda-forge napari-svg 0.1.3 py_0 conda-forge nbconvert 5.6.1 py37_1 nbformat 5.0.7 py_0 netcdf4 1.5.3 py37h012c1a0_0 networkx 2.4 py_1 ninja 1.10.0 py37h7ef1ec2_0 notebook 6.0.3 py37hc8dfbb8_1 conda-forge numba 0.48.0 py37h47e9c7a_0 numcodecs 0.6.4 py37ha925a31_0 numpy 1.18.1 py37h93ca92e_0 numpy-base 1.18.1 py37hc3f5095_1 numpydoc 1.1.0 py_0 olefile 0.46 py37_0 openjpeg 2.3.1 h57dd2e7_3 conda-forge openssl 1.1.1g he774522_1 packaging 20.4 py_0 pandas 1.0.3 py37h47e9c7a_0 pandoc 2.10.1 0 pandocfilters 1.4.2 py37_1 panel 0.9.5 py_0 pyviz param 1.9.3 py_0 parso 0.7.0 pyh9f0ad1d_0 conda-forge partd 1.1.0 py_0 pickleshare 0.7.5 py37_1001 pillow 7.2.0 py37hcc1f983_0 pims 0.5 py_1 pip 20.1.1 py_1 conda-forge pluggy 0.13.1 py37_0 prometheus_client 0.8.0 py_0 prompt-toolkit 3.0.5 py_0 psutil 5.7.0 py37he774522_0 pycosat 0.6.3 py37he774522_0 pycparser 2.20 py_2 pyct 0.4.6 py37_0 pygments 2.6.1 py_0 pyopengl 3.1.1a1 py37_0 pyopenssl 19.1.0 py_1 pyparsing 2.4.7 py_0 pyqt 5.12.3 py37h1834ac0_3 conda-forge pyqt5-sip 4.19.18 pypi_0 pypi pyqtchart 5.12 pypi_0 pypi pyqtwebengine 5.12.1 pypi_0 pypi pyrsistent 0.16.0 py37he774522_0 pyside2 5.13.2 py37hfa7ce6d_2 conda-forge pysocks 1.7.1 py37_1 python 3.7.7 h81c818b_4 python-dateutil 2.8.1 py_0 python-lmdb 0.96 py37h6538335_0 conda-forge python-snappy 0.5.4 py37ha925a31_0 python_abi 3.7 1_cp37m conda-forge pytorch 1.2.0 py3.7_cuda100_cudnn7_1 pytorch pytz 2020.1 py_0 pyviz_comms 0.7.6 py_0 pywavelets 1.1.1 py37he774522_0 pywin32 227 py37he774522_1 pywinpty 0.5.7 py37_0 pyyaml 5.3.1 py37he774522_1 pyzmq 19.0.1 py37ha925a31_1 qt 5.12.5 h7ef1ec2_0 conda-forge qtconsole 4.7.5 py_0 qtpy 1.9.0 py_0 requests 2.24.0 py_0 rise 5.6.1 py37_1 ruamel_yaml 0.15.87 py37he774522_1 scikit-image 0.17.2 py37h3bbf574_1 conda-forge scikit-learn 0.23.1 py37h25d0782_0 scipy 1.5.0 py37h9439919_0 send2trash 1.5.0 py37_0 setuptools 49.6.0 py37_0 six 1.15.0 py_0 slicerator 1.0.0 py_0 snappy 1.1.8 h33f27b4_0 snowballstemmer 2.0.0 py_0 sortedcontainers 2.2.2 py_0 sphinx 3.2.1 py_0 sphinxcontrib-applehelp 1.0.2 py_0 sphinxcontrib-devhelp 1.0.2 py_0 sphinxcontrib-htmlhelp 1.0.3 py_0 sphinxcontrib-jsmath 1.0.1 py_0 sphinxcontrib-qthelp 1.0.3 py_0 sphinxcontrib-serializinghtml 1.1.4 py_0 sqlite 3.32.3 h2a8f88b_0 tbb 2020.0 h74a9793_0 tblib 1.6.0 py_0 terminado 0.8.3 py37_0 testpath 0.4.4 py_0 threadpoolctl 2.1.0 pyh5ca1d4c_0 thrift 0.13.0 py37ha925a31_0 tifffile 2020.7.22 py_0 conda-forge tk 8.6.10 he774522_0 toolz 0.10.0 py_0 torchvision 0.4.0 py37_cu100 pytorch tornado 6.0.4 py37he774522_1 tqdm 4.48.2 py_0 traitlets 4.3.3 py37_0 traittypes 0.2.1 py_1 conda-forge typing_extensions 3.7.4.2 py_0 urllib3 1.25.10 py_0 vc 14.1 h0510ff6_4 vispy 0.6.4 py37hbc2f12b_1 conda-forge vs2015_runtime 14.16.27012 hf0eaf9b_3 wcwidth 0.2.5 py_0 webencodings 0.5.1 py37_1 wheel 0.34.2 py37_0 widgetsnbextension 3.5.1 py37_0 win_inet_pton 1.1.0 py37_0 wincertstore 0.2 py37_0 winpty 0.4.3 4 wrapt 1.12.1 py37he774522_1 xarray 0.15.1 py_0 xz 5.2.5 h62dcd97_0 yaml 0.2.5 he774522_0 zarr 2.4.0 py_0 conda-forge zeromq 4.3.2 ha925a31_2 zfp 0.5.5 ha925a31_1 conda-forge zict 2.0.0 py_0 zipp 3.1.0 py_0 zlib 1.2.11 h62dcd97_4 zstandard 0.13.0 py37ha925a31_0 zstd 1.4.5 h04227a9_0

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/6542, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTF4ZU6ZTI5XOB46XE3SB7HK3ANCNFSM4QIDWSTQ .

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fdask%2Fdask%2Fissues%2F6542%23issuecomment-678654441&data=02%7C01%7C%7Cb0599f725f4e40af35bc08d846b054b7%7C569df091b01340e386eebd9cb9e25814%7C0%7C0%7C637337070439919622&sdata=33Nj%2FEciMl%2FYoFC2adHe%2FSeilJh1u7U%2FOg742M9R%2FYo%3D&reserved=0, or unsubscribehttps://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fnotifications%2Funsubscribe-auth%2FALWSD4EJQ3FLXXUYWYP3VLDSB7QCBANCNFSM4QIDWSTQ&data=02%7C01%7C%7Cb0599f725f4e40af35bc08d846b054b7%7C569df091b01340e386eebd9cb9e25814%7C0%7C0%7C637337070439919622&sdata=MbqrF0EpHjQ%2FanRb0HvDmZWOFQ3KkEzDz3wFyutLNkg%3D&reserved=0.

Sh4zKh4n commented 4 years ago

See the below example, it hasnt failed yet but I am at 20mins and still at 0%. Im either doing something very silly or something is not in the different methods is missing here? All the examples ive seen have been deconvolution or on small datasets. No example has been done on a bigger than ram dataset on a local machine. So I am wondering is there a disconnect? Or in the most likely case Im doing something very stupid or theres something wrong with my enviroment/global settings. You can just about see below that the process is writing to my local hard disk, but not to the destination hard drive. I have even tried this to use zarr directly and copy the result as the zarr tutorial suggests hoping it would do chunk by chunk. Ive tried alot of things out now. Declaring the dtype implicitly but this is a 20GB zarr saved array and on a machine with 10-13GB of ram out of 16GB free. It just feels weird that it doesnt work.

image

Sh4zKh4n commented 4 years ago

I ran this manual hand stretch and it worked perfectly! but on anything other than this it's a no when it comes to functions or libraries.

image

image

See how that took 4 or so mins with ufunc and using no function. Perfect so I see the potential I just dont know how to implement Map_blocks or Map_overlap? I am nearly at the stage of phisically chunking the files to less than 5GB but I have 6*20GB files and though it would work it just feels like everyone talks about dasks promise but theres no minimum example not even in dask image of a bigger than ram computation.

Sh4zKh4n commented 4 years ago

Now im on 52 mins and 0% completed. Ill see how long this runs? After dinner Ill try map_overlap.

mrocklin commented 4 years ago

It looks like you're out of memory, which would explain why things are slow. Maybe the scipy.ndimage functions expand memory a lot and you need smaller chunks? In your situation I would investigate what's taking up all of your memory. You might also try asking for help from someone who understands scipy.ndimage well.

You might also want to read through the general docs on understanding performance: https://docs.dask.org/en/latest/understanding-performance.html

On Sat, Aug 22, 2020 at 9:48 AM Sh4zKh4n notifications@github.com wrote:

Now im on 52 mins and 0% completed. Ill see how long this runs? After dinner Ill try map_overlap.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/dask/dask/issues/6542#issuecomment-678663700, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTHPEIF5TMP65WKLXNDSB7ZGBANCNFSM4QIDWSTQ .

Sh4zKh4n commented 4 years ago

@mrocklin , I completely agree but part of the problem then becomes this is a very simple (though the simplest things can be the toughest. Take that contrast stretch I did, I found that the optimal chunk size was about <50MB, the docs have changed but it used to say 100MB. As the general docs discuss about chunk sizes, it mentions aligning along a dim but not on in memory copies and intermediary results. Some one mentioned rechunker to see how that could help but that didnt immediately make sense. If in memory copies are made that could be multiples. Im guessing this is obvious to you but just want to make sure I am thinking along the right lines. by the way on this data set which is shown above the total size is 19GB ish but each chunk is 21.65MB. So do I decrease the chunk size to even further?

Is there a way to use the task graph to get an idea of intermediary results? For example, Dask takes a very small Chunk, and records the size of each array thats created before saving/persisting to memory. This isnt just a problem with this one scipy.ndimage but the same for dask_image, that filter doesnt work for me either. The only example I can find of dask_image being used is on a the tiny astronaut? So I dont want to say it's just scipy.image. Ill try a dask_image version in a minute and show.

Sh4zKh4n commented 4 years ago

@mrocklin ,actually nix that i just tried the gaussian filter and that seems to be working....

Ok now i feel a bit stupid, I went back to dask image and it worked but this time I didnt include it in a function..... So Ive been messing around with data that ultimately could have been run a bit easier...... just tried median from dask_image and didnt put it into a function nor map_blocks....

image

So then back to the issue of wrapping other libraries around dask. The one think i notice in the documents is that the are all wrapped versions of ndimage which has a output as an option. In the dask_image docs, this is removed. Is there a way to stop this argument option? It's the only thing that seems similar for all of these functions so far?

By the way thanks for making me try again, I ithink I had earlier tried it with a non optimised chunk size and I had a memory error. Then I tried it by dropping in as a replacement into a function. Would that cause some kind of memory issue? It should still have no output? The only thing I can imagine is that I am building functions like the below:

def func( img, args) func_img = do.something.(img, args) return func_img out = map.blocks(func, img, arg) out.to_zarr('filepath'.zarr)

The only thing I can see would be that through looping with dask the function is trying to build the whole array, then not being able to build the array it's stuck and cant save? any thoughts? I think its the same issue but not sure how to create the work around people have used?

mrocklin commented 4 years ago

I'm not sure I fully understand your last question. I'm going to transfer this issue over to the dask-image project, mostly just to alert those folks about your questions. They may also be busy though, so no guarantees.

Sh4zKh4n commented 4 years ago

So I kind of asked about the wrapper already, before you moved this over. Though the second part of the question on how to build a function for image functions is still relevant. In reality, I chose the median filter, because I couldnt get a uniform or mean filter to run on my data set. I've found a nice bit of code that showed using convolve1d repeated over each access and its working fine and I dont have to do any intermediary steps of rechunking to smaller sizes. No warning from zarr about object type or anything! So thats a positive! its pretty simples as well, it worked with map blocks but the result was a bit blocky as you expect. Im running it with map_blocks overlap now and it seems to be working!

def media3d(img, size): kernel = np.ones(size, img.dtype) / size result = convolve1d(convolve1d(convolve1d(img, kernel, axis=0), kernel, axis=1), kernel, axis=2) return result

The only thing I should point out is that the map_overlap docs say state this as da.map_overlap(func, x.....), but I kept getting an error on function doesnt have an ndim. Reading the error I noted that map_overlap instead wants da.map_overlap(x, func.....). So there seems to be a mistake in the docs. I'm not sure how to do a PR request but thought Id let you know.

So @mrocklin , thanks the biggest learning so far, is that you may have a chunk size thats x, but dependent on the func from a library your pulling out, this could be multiples of in memory copies and it may float the dtype up, creating multiple in memory copies that can jam up the works of running dask through this. So hopefully the guys in dask_image can help with writing funcs for scikit image and ndimage because that would be a big help!

mrocklin commented 4 years ago

If you're concerned about the upcasting of uint8 arrays to float64 then that's actually part of scikit-image and scipy. You may have to ask folks there. Dask projects don't control how those projects operate.

I'm glad that you've found map_overlap. If there are issues with the documentation I encourage you to be more specific about where the error is, and maybe even submit a pull request if you're willing to help out.

jni commented 4 years ago

Copying over my response from the image.sc forum here, since I really don't understand why dask_image.ndfilters.median_filter runs fine while scipy.ndimage.median_filter, which it wraps, cannot process even a single chunk. Response with code snippets below. I am just a dask user, not an expert, so corrections from maintainers are very welcome, as well as speculation about what could be different.


@Sh4zKh4n I recommend that you provide a complete failing example as code (not screenshots) that people can run on their own machine. StackOverflow has a good article on this.

This makes it much easier for people to debug your use case rather than try to think about it just from reading it, which is hard, or have to abstract away the problem.

In your case, I agree with Matt Rocklin that it seems the ndimage median filter is the failing function. Unfortunately, ndimage just does not do great for large kernel sizes. For example:

import numpy as np
from scipy import ndimage as ndi

def median3d(image):
    return ndi.median_filter(image, size=25)

res = median3d(
    np.random.randint(0, 256, size=(285, 245, 310)).astype(np.uint8)
)

ie running your code for just one chunk is taking forever on my machine (3min and counting), using up "only" 2GB of RAM. There is an open issue in SciPy about this excessive RAM usage, but it is really quite a difficult one to fix, which is why it remains open 5 years later.

Now, the thing I totally don't understand is why dask_image.ndfilters.median_filter is working at all for you, since literally all it is doing is wrapping ndimage.median_filter in a map_overlap call. You can see the source code and find that it boils down to a map_blocks call just like yours. I can also reproduce that behaviour as follows:

import numpy as np
from dask_image.ndfilters import median_filter
from dask.diagnostics import ProgressBar

pbar = ProgressBar()
pbar.register()

arr = da.random.randint(
    0, 256, size=(10260, 1225, 1550),
    dtype=np.uint8,
    chunks=(285, 245, 310)
)

smoothed = median_filter(arr, size=25)
smoothed.to_zarr('smoothed.zarr')
[#                      ] | 3% Completed |  3min 36.8s

But, it does eventually start using too much RAM and spilling to disk, and brings my computer to a standstill...

On that I really must say I am stumped.

Things I would try next in your situation:

Again, I totally don't understand why dask_image median_filter works at all, given that I can't process even a single chunk with the SciPy version. That is something to discuss in the dask_image issue — I'll add this comment there.

GenevieveBuckley commented 4 years ago

Hi @@Sh4zKh4n,

I've read through your thread and will try to respond to all of the points below, so there may be some repetition with what Matt and Juan may have said earlier.

As Matt explains, unless a 3rd party library explicitly claims to support dask arrays, you won't be able to expect them to work well together. This is why you're seeing compute called immediately, the functions you are trying are attempting to coerce the dask array into something they know how to handle like numpy arrays.

For the two libraries you mention by name:

I went back to dask image and it worked but this time I didnt include it in a function..... So Ive been messing around with data that ultimately could have been run a bit easier...... just tried median from dask_image and didnt put it into a function nor map_blocks....

That's right, dask-image functions implement map_overlap for scipy functions, so there's no need to put it inside your own map_overlap.

I chose the median filter, because I couldnt get a uniform or mean filter to run on my data set.

Are you saying that the dask-image ndfilters.uniform_filter and ndfilters.gaussian_filter didn't work for you, but that the dask-image ndfilters.median_filter did work? If that's the case we'd be grateful if you could open a specific issue with a minimal, complete, and verifiable example of the behaviour (please see here for how to create one).

Sh4zKh4n commented 4 years ago

Copying over my response from the image.sc forum here, since I really don't understand why dask_image.ndfilters.median_filter runs fine while scipy.ndimage.median_filter, which it wraps, cannot process even a single chunk. Response with code snippets below. I am just a dask user, not an expert, so corrections from maintainers are very welcome, as well as speculation about what could be different.

@Sh4zKh4n I recommend that you provide a complete failing example as code (not screenshots) that people can run on their own machine. StackOverflow has a good article on this.

This makes it much easier for people to debug your use case rather than try to think about it just from reading it, which is hard, or have to abstract away the problem.

In your case, I agree with Matt Rocklin that it seems the ndimage median filter is the failing function. Unfortunately, ndimage just does not do great for large kernel sizes. For example:

import numpy as np
from scipy import ndimage as ndi

def median3d(image):
    return ndi.median_filter(image, size=25)

res = median3d(
    np.random.randint(0, 256, size=(285, 245, 310)).astype(np.uint8)
)

ie running your code for just one chunk is taking forever on my machine (3min and counting), using up "only" 2GB of RAM. There is an open issue in SciPy about this excessive RAM usage, but it is really quite a difficult one to fix, which is why it remains open 5 years later.

Now, the thing I totally don't understand is why dask_image.ndfilters.median_filter is working at all for you, since literally all it is doing is wrapping ndimage.median_filter in a map_overlap call. You can see the source code and find that it boils down to a map_blocks call just like yours. I can also reproduce that behaviour as follows:

import numpy as np
from dask_image.ndfilters import median_filter
from dask.diagnostics import ProgressBar

pbar = ProgressBar()
pbar.register()

arr = da.random.randint(
    0, 256, size=(10260, 1225, 1550),
    dtype=np.uint8,
    chunks=(285, 245, 310)
)

smoothed = median_filter(arr, size=25)
smoothed.to_zarr('smoothed.zarr')
[#                      ] | 3% Completed |  3min 36.8s

But, it does eventually start using too much RAM and spilling to disk, and brings my computer to a standstill...

On that I really must say I am stumped.

Things I would try next in your situation:

  • try using the dask.distributed scheduler and some local workers. By default, dask.array uses the threaded scheduler, which as I understand it will request as much RAM as the OS will give, and the OS will give more than it has and spill to disk, which is inefficient. Distributed gives you fine control over the resources you allocate to each worker. We saw that a single chunk takes up about 2GB of RAM to process, so you should be good with e.g. 3 workers with 5GB each (it's good to have a bit of overhead space!).
  • We have an open PR for 3D rank filters in scikit-image that should not suffer from the ndimage excessive RAM issue here. The PR needs review but as I understand it it's ready to go, so you should be able to build that branch locally and try it. It should be much faster than ndimage for large kernels such as the one you are using.

Again, I totally don't understand why dask_image median_filter works at all, given that I can't process even a single chunk with the SciPy version. That is something to discuss in the dask_image issue — I'll add this comment there.

Sh4zKh4n commented 4 years ago

Sorry for accidentally closing the issue!!! I was trying to cancel the comment sorry!

@jni

Sorry I should have added a better code snippet, Ill read that article and add. t dask_imag.ndfilters.median had the same problem when I did the same thing as you had with the chunk sizes you are and I originally used. One of my later comments based on advice from @mrocklin was that the chunk sizes were too big, I checked the overall size of the output file and chunk sizes by not computing the output result but checking the dask widget for size(which is why I added the screenshots, not for code but for the chunk sizes). So I reduced them even further, and that how I got it to work. It came out very blocky because the blocks werent sharing edges. So I tried that convolve1d function in the z, y and then y direction. It was incredibly slow but it worked as a mean filter. @GenevieveBuckley i havent had a chance to go through all your advice so will reply as I go through and try. Ultimately what I did was I chunked down to chunk sizes as 2MB for a uint8 file but when upfloated this is going to be a chunk of roughly 17MB vs what didnt work which was a chunk of 20MB upfloating to a 173MB file. Which just gets stuck.

I am making progress as I go along, happy to share if it helps people here? I will try to build a MVCP (?) but is a random array actually appropriate because this is image processing so a structure of some sort might be appropriate.

To be clear, my over aim is to process 6* of these files which in the end need to be registered and merged. Not one for this, as its not an issue but a how to. I am working my way through it. Because the file sizes are big, I am trying to create masks that I can then use to further reduce the size of the data, hard lesson learnt over my time working with big image data, is if its big try to make it smaller. I had hoped to rescale the images with scikit image function but kept hitting a wall, what I was trying to do was rescale down to process for masks that were small then rescale up to creat the masks. Unfortunately the rescale function didnt work. I might try striding but ultimately I also will need to process larger file sizes so have been try to get the functions to work. The types of examples I have seen are about deconvolution with dask, which is very cool but the bread and butter I am learning isnt so easy.... Though I am making progress...

small chunk sizes around 2MB got the median filter working for dask_image and all I could tell was the difference was the output was removed according to the docs.

I hope that makes sense?

Sh4zKh4n commented 4 years ago

@jni the way i got it to work was the following

import numpy as np
from dask_image.ndfilters import median_filter
from dask.diagnostics import ProgressBar

pbar = ProgressBar()
pbar.register()

arr = da.random.randint(
    0, 256, size=(10260, 1225, 1550),
    dtype=np.uint8,
    chunks=(285, 245, 310)
)

arr = arr.rechunk((95, 245, 155))

smoothed = median_filter(arr, size=25)
smoothed.to_zarr('smoothed.zarr')

That worked for me and I have a total of 16GB on my laptop and that was with 4 workers and 2GB as is normally cast. I ran this with distributed last night (on my dataset) and this worked fine. Try it, its an issue, with this filter with chunk sizes, which is why i wondered if it was rather about the changes /blocking of out arg from ndimage.

Sh4zKh4n commented 4 years ago

@GenevieveBuckley Sorry for the confusion but the guassian has worked for me but when I reduced the chunk size significantly. So they work for small chunk sizes. The up cast the chunk to a float increasing the size significantly so it doesnt work. rechunking the data does though work, chunking to a very small size. Now I think ive made a mistake above and should point out that I did that guassian blur with the chunk size of (285, 245, 310) it works fine. I think the chunk size was when it came to scikit image since it kept upcasting the chunks. The dask guassian blur works fine, I completed it without distributed, I tried that with a kernel with a size (10, 10, 10) and that completed in 36 mins. I did now just try distributed, with guassian blur but kernel (50, 50, 50) but it was a bit hard to tell what was happening. It holds for a while. I went back to without distributed and just feel comfortable when I can see one progress bar. This also seems to hold for while but does actually look like its working. I dont see much spilling to disc looking at task manager and right now its using a total of 6GB ish without using distributed.

import numpy as np
from dask_image.ndfilters import median_filter
from dask.diagnostics import ProgressBar

pbar = ProgressBar()
pbar.register()

arr = da.random.randint(
    0, 256, size=(10260, 1225, 1550),
    dtype=np.uint8,
    chunks=(285, 245, 310)
)

smoothed = guassian_filter(arr, size=25)
smoothed.to_zarr('smoothed.zarr')

This seems to be working fine.

Sh4zKh4n commented 4 years ago

@jni

import numpy as np
from dask_image.ndfilters import median_filter
from dask.diagnostics import ProgressBar

pbar = ProgressBar()
pbar.register()

arr = da.random.randint(
    0, 256, size=(10260, 1225, 1550),
    dtype=np.uint8,
    chunks=(285, 245, 310)
)

# load/import classes
from dask.distributed import Client

# set up cluster and workers
client = Client(n_workers=3, 
                threads_per_worker=1,
                memory_limit='4GB')

# have a look at your workers
client

smoothed = guassian_filter(arr, size=(50,50,50))
smoothed.to_zarr('smoothed.zarr')

This is very strange, I interrupted the run I was doing with the guassian blur and decided to run the distributed with your suggestion but I ran with 3 workers and 4GB each. The workers arent taking up all that much space but its spilling to disc but its running... slowly. much slower than without distributed?

GenevieveBuckley commented 4 years ago

I am making progress as I go along, happy to share if it helps people here?

I'm glad you're making progress on your work, and no it's not necessary to share updates here :)

Sh4zKh4n commented 4 years ago

@GenevieveBuckley the table is really helpful and thanks for getting a PR going on the docs. I think its time to close the issue.