Large input datasets cause out-of-memory errors

bcdev / nc2zarr

A Python tool that converts NetCDF files to Zarr format

MIT License

9 stars 3 forks source link

Large input datasets cause out-of-memory errors #10

Closed forman closed 3 years ago

forman commented 3 years ago

We need to ensure, Dask arrays are efficently used when loading very large dataset.

Introduce new input parameter input/prefetch_chunks that will force fetching the internal NetCDF chunking from a first of a set of input files. We will then use this chunk sizes to open all input files.