NOAA-ORR-ERD / model_catalogs

Python library for developing and working with catalogs of oceanographic model results
https://model-catalogs.readthedocs.io/en/latest/index.html
MIT License
6 stars 1 forks source link

Issue reading GOFS results #56

Open SorooshMani-NOAA opened 1 month ago

SorooshMani-NOAA commented 1 month ago

Due to having multiple times coordinates in GOFS metadata, it fails to read the dataset using the defined catalog, the metadata:

<xarray.Dataset> Size: 2TB                                                                                                                                     
Dimensions:       (lat: 4251, lon: 4500, depth: 40, time: 129, time1: 521, time2: 409, time3: 129, time4: 129)
Coordinates: (12/13)
  * lat           (lat) float64 34kB -80.0 -79.96 -79.92 ... 89.92 89.96 90.0
  * lon           (lon) float64 36kB 0.0 0.07996 0.16 0.24 ... 359.8 359.8 359.9                                                                                         * depth         (depth) float64 320B 0.0 2.0 4.0 6.0 ... 3e+03 4e+03 5e+03
  * time          (time) datetime64[ns] 1kB 2024-09-17T12:00:00 ... 2024-10-0...                                                                                           time_run      (time) datetime64[ns] 1kB ...
  * time1         (time1) datetime64[ns] 4kB 2024-09-17T12:00:00 ... 2024-10-...
    ...            ...
  * time2         (time2) datetime64[ns] 3kB 2024-09-17T12:00:00 ... 2024-10-...
    time2_run     (time2) datetime64[ns] 3kB ...
  * time3         (time3) datetime64[ns] 1kB 2024-09-17T12:00:00 ... 2024-10-...
    time3_run     (time3) datetime64[ns] 1kB ...
  * time4         (time4) datetime64[ns] 1kB 2024-09-17T12:00:00 ... 2024-10-...
    time4_run     (time4) datetime64[ns] 1kB ...
Data variables: (12/21)
    time_offset   (time) datetime64[ns] 1kB ...
    time1_offset  (time1) datetime64[ns] 4kB ...
    time2_offset  (time2) datetime64[ns] 3kB ...
    time3_offset  (time3) datetime64[ns] 1kB ...
    time4_offset  (time4) datetime64[ns] 1kB ...
    sst           (time1, lat, lon) float32 40GB ...
    ...            ...
    water_u       (time4, depth, lat, lon) float32 395GB ...
    surf_el       (time2, lat, lon) float32 31GB ...
    steric_ssh    (time2, lat, lon) float32 31GB ...
    water_temp    (time3, depth, lat, lon) float32 395GB ...
    salinity      (time, depth, lat, lon) float32 395GB ...
    water_v       (time4, depth, lat, lon) float32 395GB ...
Attributes: (12/23)
    institution:               Fleet Numerical Meteorology and Oceanography C...
    source:                    HYCOM archive file, GLBz0.04
    comment:                   p-grid
    field_type:                instantaneous
    Conventions:               CF-1.4, NAVO_netcdf_v1.1
    grid_name:                 glby0.08
    ...                        ...
    downgrade_date:            not applicable
    classification_authority:  not applicable
    cdm_data_type:             GRID
    featureType:               GRID
    location:                  Proto fmrc:FMRC_ESPC-D-V02_all
    history:                   FMRC Best Dataset

The error is:

> mc.find_availability(main_cat.GOFS['hycom-forecast-agg'])
KeyError: "Receive multiple variables for key 'T': {'time3', 'time2', 'time4', 'time', 'time1'}. Expected only one. Please pass a list ['T'] instead to get all variables matching 'T'."

Is there any way to ask for specific variables so that there's only one time? e.g. only ask for surf_el.

SorooshMani-NOAA commented 1 month ago

Shouldn't the catalog define mapping of T to multiple time variables so that it goes through this part of the code?

https://github.com/NOAA-ORR-ERD/model_catalogs/blob/7e89c9d7087eda9713963ee9e5cd3fd0083d76b8/model_catalogs/process.py#L378-L391

SorooshMani-NOAA commented 1 month ago

Although I'm not sure if that's the best solution, in GOFS the times are different:

> p ds.time.shape
(129,)
> p ds.time2.shape
(409,)
> p ds.time1.shape
(521,)
> p ds.time3.shape
(129,)