The purpose of xradio._utils.zarr.common._load_no_dask_zarr is to load a processing set without using dask so that it can be used within a function that gets dask.delayed. However, when a slice is selected that spans multiple chunks on disk the function consumed considerably more memory compared to using xarray.open_zarr.
Example:
If I have a zarr array on disk with dimensions (816, 1275, 3840, 2) and chunking (816, 1275, 200, 2) the following call will load the entire array into memory and then do the subselection:
Contains a new function _open_dataset(store, xds_isel=None, data_variables=None, load=False) that is used by both read_processing_set and load_processing_set.
src/xradio/vis/load_processing_set.py:
Uses _open_dataset
load_processing_set now allows for specifying which data_variables to load and can be set not to load the sub_datasets (weather_xds, pointing_xds, etc.).
The purpose of xradio._utils.zarr.common._load_no_dask_zarr is to load a processing set without using dask so that it can be used within a function that gets dask.delayed. However, when a slice is selected that spans multiple chunks on disk the function consumed considerably more memory compared to using xarray.open_zarr.
Example: If I have a zarr array on disk with dimensions (816, 1275, 3840, 2) and chunking (816, 1275, 200, 2) the following call will load the entire array into memory and then do the subselection:
The plot of the memory consumption:
This can be fixed by using
Using dask.config.set(scheduler="synchronous") forces .load to make use of a single thread and no dask graph is created (https://docs.dask.org/en/stable/scheduler-overview.html#debugging-the-schedulers).
This solution was suggested in the first comment of https://github.com/pydata/xarray/issues/3386 .
src/xradio/_utils/zarr/common.py:
src/xradio/vis/load_processing_set.py:
src/xradio/vis/read_processing_set.py: