CIROH-UA / NGIAB_data_preprocess

Tools to subset hydrofabrics, generate forcings, create default realizations for NGIAB
5 stars 2 forks source link

Investigate Zarr metadata fetching being slow #3

Closed JoshCu closed 1 month ago

JoshCu commented 4 months ago

xarray.open_mf_dataset is slow and synchronous. this function here
Investigate and fix the issue. Teerh would be a good place to start.

JoshCu commented 1 month ago

parallel = True does nothing without a dask distributerd cluster running working code is the following

def load_zarr_datasets() -> xr.Dataset:
    """Load zarr datasets from S3 within the specified time range."""
    # if a LocalCluster is not already running, start one
    if not Client(timeout="2s"):
        cluster = LocalCluster()    
    forcing_vars = ["lwdown", "precip", "psfc", "q2d", "swdown", "t2d", "u2d", "v2d"]
    s3_urls = [
        f"s3://noaa-nwm-retrospective-3-0-pds/CONUS/zarr/forcing/{var}.zarr"
        for var in forcing_vars
    ]
    s3_stores = [open_s3_store(url) for url in s3_urls]
    dataset = xr.open_mfdataset(s3_stores, parallel=True, engine="zarr")
    return dataset