C2SM / iconarray

R&D Python package for processing ICON data with xarray
https://c2sm.github.io/iconarray/
BSD 3-Clause "New" or "Revised" License
10 stars 2 forks source link

support for write grib (COSMO/ICON) files #27

Closed cosunae closed 1 year ago

cosunae commented 1 year ago

cfgrib.open_datasets produces xarray datasets that follow the CF data model and conventions. They look like:

<xarray.Dataset>
Dimensions:     (hybrid: 137, latitude: 301, longitude: 571)
Coordinates:
    time        datetime64[ns] 2022-10-05T06:00:00
    step        timedelta64[ns] 00:00:00
  * hybrid      (hybrid) float64 1.0 2.0 3.0 4.0 5.0 ... 134.0 135.0 136.0 137.0
  * latitude    (latitude) float64 65.0 64.9 64.8 64.7 ... 35.3 35.2 35.1 35.0
  * longitude   (longitude) float64 -10.0 -9.9 -9.8 -9.7 ... 46.7 46.8 46.9 47.0
    valid_time  datetime64[ns] ...
Data variables:
    etadot      (hybrid, latitude, longitude) float32 ...
Attributes:
    GRIB_edition:            2
    GRIB_centre:             ecmf
    GRIB_centreDescription:  European Centre for Medium-Range Weather Forecasts
    GRIB_subCentre:          0
    Conventions:             CF-1.7
    institution:             European Centre for Medium-Range Weather Forecasts,

where the Dataarrays look like:

<xarray.DataArray 'etadot' (hybrid: 137, latitude: 301, longitude: 571)>
[23546327 values with dtype=float32]
Coordinates:
    time        datetime64[ns] 2022-10-05T06:00:00
    step        timedelta64[ns] 00:00:00
  * hybrid      (hybrid) float64 1.0 2.0 3.0 4.0 5.0 ... 134.0 135.0 136.0 137.0
  * latitude    (latitude) float64 65.0 64.9 64.8 64.7 ... 35.3 35.2 35.1 35.0
  * longitude   (longitude) float64 -10.0 -9.9 -9.8 -9.7 ... 46.7 46.8 46.9 47.0
    valid_time  datetime64[ns] ...
Attributes:
    GRIB_paramId:                             77
    GRIB_dataType:                            fc
    GRIB_numberOfPoints:                      171871
    GRIB_typeOfLevel:                         hybrid
    GRIB_stepUnits:                           1
    GRIB_stepType:                            instant
    GRIB_gridType:                            regular_ll
    GRIB_NV:                                  276
    GRIB_Nx:                                  571
    GRIB_Ny:                                  301
    GRIB_cfName:                              unknown
    GRIB_cfVarName:                           etadot
    GRIB_gridDefinitionDescription:           Latitude/longitude
    GRIB_iDirectionIncrementInDegrees:        0.1
    GRIB_iScansNegatively:                    0
    GRIB_jDirectionIncrementInDegrees:        0.1
    GRIB_jPointsAreConsecutive:               0
    GRIB_jScansPositively:                    0
    GRIB_latitudeOfFirstGridPointInDegrees:   65.0
    GRIB_latitudeOfLastGridPointInDegrees:    35.0
    GRIB_longitudeOfFirstGridPointInDegrees:  350.0
    GRIB_longitudeOfLastGridPointInDegrees:   47.0
    GRIB_missingValue:                        9999
    GRIB_name:                                Eta-coordinate vertical velocity
    GRIB_shortName:                           etadot
    GRIB_units:                               s**-1
    long_name:                                original GRIB paramId: 77
    units:                                    1

All attributes that are named like GRIB_ are keys decoded from grib. Any DataArray or Dataset that follows this data model can in principle be written into a grib file by calling canonical_dataset_to_grib https://github.com/ecmwf/cfgrib/blob/8578f1012974e88d962f3fa13c0fcca973e49114/cfgrib/xarray_to_grib.py#L255

However there are few important remarks that define the behaviour of the functionality that writes grib records. In the parameter namespace, writing a variable require knowing the triplet discipline, parameterCategory and parameterNumber. For some variables additional keys need to be specified. Historically Fieldextra kept a dictionary with all variables, mapping the name to the combination of parameter keys that represent in grib that variable: https://github.com/COSMO-ORG/fieldextra/blob/74641a1edf563c5478accbdf003d3aad94010ba1/resources/dictionary_cosmo.txt#L216 Following that approach would require maintaining a similar dictionary (in yaml) for iconarray. The approach of eccodes is though different. They implement the mappings of the various required parameter keys into a unique identifier (paramId) in the local concepts. https://github.com/ecmwf/eccodes/blob/develop/definitions/grib2/localConcepts/edzw/paramId.def Each center can have its own concept.

The function cfgrib.canonical_dataset_to_grib will then require an attribute GRIB_paramId and use the local concepts to define the various parameter grib keys in the record. In order for that to work, GRIB_centre must be set appropriately.

Similarly, for specifying the vertical coordinates, various keys are often required. While Fieldextra encodes those in the same dictionary, eccodes uses another concept (typeOfLevel): https://github.com/ecmwf/eccodes/blob/develop/definitions/grib2/typeOfLevelConcept.def

And cfgrib will then use the key GRIB_typeOfLevel to determine the combination of keys.

The proposal for iconarray is to look at those concepts in eccodes, and make sure that they mimic the same behaviour fieldextra does for COSMO/ICON data. If there would be differences in behaviour we should fix them in the concept for our centre lssw

petrabaumann commented 1 year ago

I wonder how the values of scaleFactorOfFirstFixedSurface/scaledValueOfFirstFixedSurface and scaleFactorOfSecondFixedSurface/scaledValueOfSecondFixedSurface are translated within the typeOfLevel concept. This is for instance interesting, when a vertical integration between two surfaces with typeOfFirstFixedSurface = typeOfSecondFixedSurface=150 is performed. When I performed tests for IDPI, I noticed that the information on one of the two surfaces was lost. Maybe example of integrating between two surfaces of level type 150 is in practice irrelevant, but we have an example product, which includes the integral mean of potential vorticity in a layer defined by two isobaric surfaces. When looking at the eccodes FAQs, I found out that the namespace "vertical" contains the keys "typeOfLevel", "level", "bottomLevel", "topLevel", and "pv" (see https://confluence.ecmwf.int/display/UDOC/What+are+namespaces+-+ecCodes+GRIB+FAQ). As far as I remember, GRIB_topLevel and GRIB_bottomLevel were not automatically set when reading vertical grid information with cfgrib. So this has to be double-checked.

petrabaumann commented 1 year ago

Note that currently definitions/grib2/localConcepts/lssw points to definitions/grib2/localConcepts/edzw, i.e., we follow the COSMO GRIB2 policy here.

petrabaumann commented 1 year ago

"All attributes that are named like GRIB are keys decoded from grib": One hast to check carefully, which of these GRIB keys are coded keys acccording to the GRIB2 standard, and which ones are derived or computed by eccodes. Nearly all GRIB_ keys in the list above are derived keys. I already mentioned the problems of the reduced information in GRIB_typeOfLevel (if topLevel and bottomLevel are actually not set, as I remarked above). I am also missing the information stored in the resolutionAndComponentFlags. When dealing with the horizontal wind components U and V of COSMO, these flags tell you if these components are defined with respect to the native rotated lat/lon grid, or to the geographic lat/lon grid. In fact, I do not see any information on that in the definition of the namespace vertical:rotated_II at https://apps.ecmwf.int/codes/grib/format/edition-independent/1/17/ But well, one may argue that this information is parameter-dependent. So I wonder where it is reflected. I wonder that "long_name" is not set to the value of "GRIB_name", as indicated in dataset.py:encode_cf_first, respectively, what are the conditions that this is the case.

petrabaumann commented 1 year ago

"Historically Fieldextra kept a dictionary with all variables, mapping the name to the combination of parameter keys that represent in grib that variable": The mapping between short name and the tuple of GRIB2 keys that is defining a parameter is / or should be the same as in eccodes-cosmo-resources/definitions/grib2/localConcepts//shortName.def. The unit and the name of the parameter are the same as in eccodes-cosmo-resources/definitions/grib2/localConcepts//name.def and eccodes-cosmo-resources/definitions/grib2/localConcepts//units.def, respectively. The fieldextra dictionary, however, keeps some additional information that is not availabe from GRIB, but is, for instance, needed for internal storage management, e.g., whether a parameter is constant in time, if it is a multi-level field, a.o.. The mapping between the set of paramIds and shortNames in the localConcepts is bijective. It is not yet clear to me, whether the paramId is unique over all centres.

cosunae commented 1 year ago

Thanks @petrabaumann, yes, we need to have a look more in detail to how the scaleFactorOfFirstFixedSurface/scaledValueOfFirstFixedSurface should be mapped into the typeOfLevel. It might be that the eccodes concept is not complete in edzw

cosunae commented 1 year ago

Proposal for conditional loading of grib keys here: https://github.com/MeteoSwiss-APN/icon_data_processing_incubator/blob/6ec3da0075d57e6102683ead5cf238697eed18e9/idpi/src/operators/flexpart.py#L32 A clean solution in cfgrib is currently not possible, therefore it is hacking cfgrib.

cosunae commented 1 year ago

Resolution and component Flags is here: https://apps.ecmwf.int/codes/grib/format/grib2/ctables/3/3