Reduce memory usage: lazy writing to `dask.array`

Splitting out from #62 this other approach to managing memory, to address the running-out-of-memory issue described by https://github.com/EcohydrologyTeam/ClearWater-modules/issues/57#issuecomment-1833765631, complementing:

Dask Array

We also want to consider using dask.array within Xarray, which under-the-hood is just a chunked collection of numpy.ndarray objects. What this gets us is the ability to handle arrays that are larger than memory, by just accessing certain chunks at a time (i.e. by timestep). These docs (at the bottom of the section) provide an explanation how lazy writing works:

https://docs.xarray.dev/en/stable/user-guide/dask.html#reading-and-writing-data

Once you’ve manipulated a Dask array, you can still write a dataset too big to fit into memory back to disk by using to_netcdf() in the usual way... By setting the compute argument to False, to_netcdf() will return a dask.delayed object that can be computed later.
from dask.diagnostics import ProgressBar

delayed_obj = ds.to_netcdf("manipulated-example-data.nc", compute=False)

with ProgressBar():
    results = delayed_obj.compute()

NOTE: We can use the Dataset.to_zarr() method the same way.

The solutions near the bottom of this thread describes a smart approach to do exactly what we need. Let's implement something similar (see the suggested code lines in response 8): https://discourse.pangeo.io/t/processing-large-too-large-for-memory-xarray-datasets-and-writing-to-netcdf/1724/8

EcohydrologyTeam / ClearWater-modules

Reduce memory usage: lazy writing to `dask.array` #69

62

68

Dask Array