cedadev / cmip6-object-store

CMIP6 Object Store Library
BSD 3-Clause "New" or "Revised" License
4 stars 4 forks source link

Decide on a chunk size to use in Dask/Zarr #17

Closed agstephens closed 4 years ago

agstephens commented 4 years ago

Using the dask chunk limit using memory does not seem to work in our tests.

Instead, we are putting an approx memory limit into our own config and calculating the number of time steps it maps to.

Config:

https://github.com/cedadev/cmip6-object-store/blob/master/cmip6_object_store/etc/config.ini#L31

Code:

https://github.com/cedadev/cmip6-object-store/blob/master/cmip6_object_store/task.py#L82

agstephens commented 4 years ago

It seems to work when we manage chunks using these details:

Target chunk size (250MB) specified in:

https://github.com/cedadev/cmip6-object-store/blob/master/cmip6_object_store/etc/config.ini#L32

And this code for managing the chunking, based on calculating the size of the array in bytes, and then splitting on time to get near to the chunk size:

https://github.com/cedadev/cmip6-object-store/blob/master/cmip6_object_store/cmip6_zarr/zarr_writer.py#L95