Closed JoshCu closed 1 month ago
parallel = True does nothing without a dask distributerd cluster running working code is the following
def load_zarr_datasets() -> xr.Dataset:
"""Load zarr datasets from S3 within the specified time range."""
# if a LocalCluster is not already running, start one
if not Client(timeout="2s"):
cluster = LocalCluster()
forcing_vars = ["lwdown", "precip", "psfc", "q2d", "swdown", "t2d", "u2d", "v2d"]
s3_urls = [
f"s3://noaa-nwm-retrospective-3-0-pds/CONUS/zarr/forcing/{var}.zarr"
for var in forcing_vars
]
s3_stores = [open_s3_store(url) for url in s3_urls]
dataset = xr.open_mfdataset(s3_stores, parallel=True, engine="zarr")
return dataset
xarray.open_mf_dataset
is slow and synchronous. this function hereInvestigate and fix the issue. Teerh would be a good place to start.