ecmwf / cfgrib

A Python interface to map GRIB files to the NetCDF Common Data Model following the CF Convention using ecCodes
Apache License 2.0
403 stars 77 forks source link

Bug in grib writing #385

Open dasarkisov opened 4 months ago

dasarkisov commented 4 months ago

Got a grib file from grib filter with only one field - 2-m temperature. Here I open, the grib, write it right away to a test file, and then reopen to see the results. Have a look at how it does the writing:

ds = xr.open_dataset('gfs.t00z.pgrb2.0p25.f000', engine='cfgrib')
print(ds)

to_grib(ds, 'test.grb')

ds1 = xr.open_dataset('test.grb', engine='cfgrib')
print(ds1) 

Output:

# original
<xarray.Dataset>
Dimensions:            (latitude: 5, longitude: 5)
Coordinates:
    time               datetime64[ns] ...
    step               timedelta64[ns] ...
    heightAboveGround  float64 ...
  * latitude           (latitude) float64 65.0 65.25 65.5 65.75 66.0
  * longitude          (longitude) float64 50.0 50.25 50.5 50.75 51.0
    valid_time         datetime64[ns] ...
Data variables:
    t2m                (latitude, longitude) float32 ...
Attributes: ...
# written
<xarray.Dataset>
Dimensions:            (latitude: 5, longitude: 5)
Coordinates:
    time               datetime64[ns] ...
    step               timedelta64[ns] ...
    heightAboveGround  float64 ...
  * latitude           (latitude) float64 65.0 65.25 65.5 65.75 66.0
  * longitude          (longitude) float64 50.0 50.25 50.5 50.75 51.0
    valid_time         datetime64[ns] ...
Data variables:
    lftx               (latitude, longitude) float32 ...
Attributes: ...

The question: why on earth does it change the name of 2-m temperature data variable 't2m' to lifted index 'lftx'??

Another question: along with the 't2m' data variable, I need to build a side (custom) data variable (as dataarray) within the same grib file, that would mimic the 't2m' (have the same dimensions/coordinates), but have custom values. Is it possible to build such custom data variable and write it together with the original 't2m' data variable?

iainrussell commented 2 months ago

Hi @dasarkisov,

Thanks for your interesting question! I downloaded some t2m data from the website and tried your sample program (thanks for providing a nice easy way to replicate the issue), but for me I got t2m in my resulting dataset, and do not see an obvious way that your problem could happen. Could you do the following please? Add this line at the top of your script:

xr.set_options(display_max_rows=40)

and then add this line after reading the GRIB into xarray:

print(ds.t2m)

(and then the equivalent for the second dataset you get). This will let us compare the GRIB keys stored in the dataset, and maybe this will give us a clue as to what is going on.

For your second question, I think you should be able to copy the original variable and change its values just using standard xarray functions. As long as it retains the 'GRIB_' attributes, it should end up in the resulting GRIB file when you write it.

dasarkisov commented 2 months ago

@iainrussell Thank you very much for your reply! This time, as you suggested, I got the freshest grib-file with 2-m temperature from grib filter and did the following:

xr.set_options(display_max_rows=40)

ds = xr.open_dataset('gfs.t00z.pgrb2.0p25.f000', engine='cfgrib')
print(ds.t2m)

xarray_to_grib.to_grib(ds, 'test.grb')

ds1 = xr.open_dataset('test.grb', engine='cfgrib')
print(ds1.t2m)

Here is the output of the original grib-file:

# original dataset
<xarray.DataArray 't2m' (latitude: 361, longitude: 721)>
[260281 values with dtype=float32]
Coordinates:
    time               datetime64[ns] ...
    step               timedelta64[ns] ...
    heightAboveGround  float64 ...
  * latitude           (latitude) float64 0.0 0.25 0.5 0.75 ... 89.5 89.75 90.0
  * longitude          (longitude) float64 0.0 0.25 0.5 ... 179.5 179.8 180.0
    valid_time         datetime64[ns] ...
Attributes:
    GRIB_paramId:                             167
    GRIB_dataType:                            fc
    GRIB_numberOfPoints:                      260281
    GRIB_typeOfLevel:                         heightAboveGround
    GRIB_stepUnits:                           1
    GRIB_stepType:                            instant
    GRIB_gridType:                            regular_ll
    GRIB_NV:                                  0
    GRIB_Nx:                                  721
    GRIB_Ny:                                  361
    GRIB_cfName:                              air_temperature
    GRIB_cfVarName:                           t2m
    GRIB_gridDefinitionDescription:           Latitude/longitude. Also called...
    GRIB_iDirectionIncrementInDegrees:        0.25
    GRIB_iScansNegatively:                    0
    GRIB_jDirectionIncrementInDegrees:        0.25
    GRIB_jPointsAreConsecutive:               0
    GRIB_jScansPositively:                    1
    GRIB_latitudeOfFirstGridPointInDegrees:   0.0
    GRIB_latitudeOfLastGridPointInDegrees:    90.0
    GRIB_longitudeOfFirstGridPointInDegrees:  0.0
    GRIB_longitudeOfLastGridPointInDegrees:   180.0
    GRIB_missingValue:                        3.4028234663852886e+38
    GRIB_name:                                2 metre temperature
    GRIB_shortName:                           2t
    GRIB_units:                               K
    long_name:                                2 metre temperature
    units:                                    K
    standard_name:                            air_temperature

Unfortunately, the print(ds1.t2m) command got me an error ("AttributeError: 'Dataset' object has no attribute 't2m'") since the data variable "t2m" still transforms into "lftx" in the process of writing in my case (and that's wild!).

Here is the output of print(ds1.lftx) command:

# written dataset
<xarray.DataArray 'lftx' (latitude: 361, longitude: 721)>
[260281 values with dtype=float32]
Coordinates:
    time               datetime64[ns] ...
    step               timedelta64[ns] ...
    heightAboveGround  float64 ...
  * latitude           (latitude) float64 0.0 0.25 0.5 0.75 ... 89.5 89.75 90.0
  * longitude          (longitude) float64 0.0 0.25 0.5 ... 179.5 179.8 180.0
    valid_time         datetime64[ns] ...
Attributes:
    GRIB_paramId:                             260127
    GRIB_dataType:                            fc
    GRIB_numberOfPoints:                      260281
    GRIB_typeOfLevel:                         heightAboveGround
    GRIB_stepUnits:                           1
    GRIB_stepType:                            instant
    GRIB_gridType:                            regular_ll
    GRIB_NV:                                  0
    GRIB_Nx:                                  721
    GRIB_Ny:                                  361
    GRIB_cfName:                              unknown
    GRIB_cfVarName:                           unknown
    GRIB_gridDefinitionDescription:           Latitude/longitude 
    GRIB_iDirectionIncrementInDegrees:        0.25
    GRIB_iScansNegatively:                    0
    GRIB_jDirectionIncrementInDegrees:        0.25
    GRIB_jPointsAreConsecutive:               0
    GRIB_jScansPositively:                    1
    GRIB_latitudeOfFirstGridPointInDegrees:   0.0
    GRIB_latitudeOfLastGridPointInDegrees:    90.0
    GRIB_longitudeOfFirstGridPointInDegrees:  0.0
    GRIB_longitudeOfLastGridPointInDegrees:   180.0
    GRIB_missingValue:                        3.4028234663852886e+38
    GRIB_name:                                Surface lifted index
    GRIB_shortName:                           lftx
    GRIB_units:                               K
    long_name:                                Surface lifted index
    units:                                    K
    standard_name:                            unknown

Regarding the second topic: I do can make a copy of any dataset and write it along with others to the gribfile, but as far as I know at this point, I am unable to create fully custom dataset with arbitrary naming of attributes. This is bacause the cfgrib reaches ecCodes and writes files strictly according to its regulations. For example, whenever I tried to change GRIB_name attribute to any arbitrary one, it got me an error; the same with any other attribute. Many attributes are read-only, others have strict limit for values the can take. The only thing possible is changing the data variable values themselves...

I tried to write an "indices" data variable containing values like [ [0, 0, 1], [1, 1, 0], [0, 1, 1] ], but failed to name such a dataset.