dask / dask-tutorial

Dask tutorial
https://tutorial.dask.org
BSD 3-Clause "New" or "Revised" License
1.83k stars 702 forks source link

Store Dask Array #190

Closed tecamenz closed 4 years ago

tecamenz commented 4 years ago

Hi

I'm working my way through the dask tutorial (currently dask arrays). I tried to subsample the weather data and save them to hdf5 / zarr

What happened: If I try to save as hdf5: --> TypeError: h5py objects cannot be pickled

If I try the solution in the notebook which uses da.to_zarr: --> ModuleNotFoundError: No module named 'zarr'

What you expected to happen: Expected to save as hdf5 and zarr

Minimal Complete Verifiable Example:

# Put your MCVE code here
import h5py
from glob import glob
import os
import dask.array as da

filenames = sorted(glob(os.path.join('data', 'weather-big', '*.hdf5')))
dsets = [h5py.File(filename, mode='r')['/t2m'] for filename in filenames]

arrays = [da.from_array(dset, chunks=(500, 500)) for dset in dsets]

x = da.stack(arrays, axis=0)

result = x[:, ::2, ::2]
# da.to_hdf5('myfile.hdf5', '/result', result)
da.to_zarr(result, os.path.join('data', 'myfile.zarr'), overwrite=True)

Anything else we need to know?:

Environment:

quasiben commented 4 years ago

Can you install zarr on your system ? conda install zarr -c conda-forge.

tecamenz commented 4 years ago

Can you install zarr on your system ? conda install zarr -c conda-forge.

Yes that worked. I thought zarr should be installed with dask as a dependency ...

mrocklin commented 4 years ago

Unfortunately not. Dask integrates with many different libraries. Installing them all would probably not be practical.

Thanks for following up. Closing.