ecmwf-lab / ai-models-graphcast

Apache License 2.0
60 stars 19 forks source link

problem writing GRIB output when initialized with ERA5 via CDS #21

Open russ-schumacher opened 1 week ago

russ-schumacher commented 1 week ago

When running graphcast using CDS, using a command like this:

ai-models --input cds --date 20221220--time 0000 --path "output-graphcast.grib" graphcast

the model runs successfully, but then fails to write the grib file on forecast hour 6. The issue appears to be with writing the precipitation field, throwing this error:

2024-09-27 16:33:26,419 INFO Converting output xarray to GRIB and saving ECCODES ERROR : concept: no match for paramId=228 ECCODES ERROR : concept: input handle edition=2, centre=ecmf ECCODES ERROR : concept: input handle dataset=era ECCODES ERROR : Please check the Parameter Database 'https://codes.ecmwf.int/grib/param-db/?id=228' 2024-09-27 16:33:34,333 ERROR Error setting edition=2 2024-09-27 16:33:34,333 ERROR Concept no match

I used the debug option and wrote output.nc, which looks fine, so the issue simply appears to be writing out to grib.

When initializing using opendata, this issue does not occur and everything looks fine.

A possible hint at the cause is from wgrib2...for the opendata-initialized run that works, the grib record looks like this: 7:5174768:d=2024092600:var discipline=0 center=98 local_table=1 parmcat=1 parm=193:surface:0-0 day acc fcst:

while in the cds-initialized run that fails, the grib record looks like this in the hour zero file (nothing is written after that): 7:14536536:d=2022122000:TPRATE:surface:0-0 day acc fcst:

Not sure why a different precipitation parameter is being written depending on the initialization source, though. Thanks for any insights!

jovanovski commented 1 week ago

Having the same issue here!

@russ-schumacher which command did you use to generate the NetCDF output?

Zappandy commented 1 week ago

@russ-schumacher and @jovanovski I'm also having the same issue. On our end, we're downloading the era5 data using climetlab with initial conditions being generated every 5 days in 2023.

I've added the code we use to download the data to reproduce our issue.

#!/usr/bin/python3
try:
    from functools import lru_cache
except ImportError:
    from backports.functools_lru_cache import lru_cache
import climetlab as cml
from datetime import datetime, timedelta

# Create a list of dates in 2023 with steps of 5 days
start_date = datetime(2023, 1, 1)
end_date = datetime(2023, 12, 31)
delta = timedelta(days=5)

dates = []
current_date = start_date

while current_date <= end_date:
    dates.append(current_date.strftime('%Y-%m-%d'))
    current_date += delta

sfc_data = cml.load_source(
    "cds",  
    "reanalysis-era5-single-levels",  
    variable = ["lsm", "2t", "msl", "10u", "10v", "tp", "z"],
    product_type = "reanalysis",
    area = [90, 0, -90, 360],
    grid = [0.25, 0.25],
    date = dates,
    time = "12:00",
    format = "grib"
)

atm_data = cml.load_source(
    "cds", 
    "reanalysis-era5-pressure-levels", 
    variable = ["t", "z", "u", "v", "w", "q"],
    level = [50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000],
    product_type = "reanalysis",
    area = [90, 0, -90, 360],
    grid = [0.25, 0.25],
    date = dates,
    time = "12:00",
    format = "grib"
)

Note that before this problem, we actually were having issues with nan values with the 2t variable because of the gribapi. I commented some lines out to bypass this as I understand some nan values in 2t are fine as long as they correspond to the ocean. However, after "fixing" this, we started seeing the same issue you guys are dealing with total precipitation.

Just for the sake of providing a more detailed diagnosis, I've added the traceback error that we had with 2t before commenting some lines out from the gribapi code. Note that debugging the gribapi is a bit of a pain because there's a circular import between eccodes and the gribapi...

2024-10-02 04:41:16,975 INFO Doing full rollout prediction in JAX: 1 minute 4 seconds.
2024-10-02 04:41:16,975 INFO Converting output xarray to GRIB and saving
ECCODES ERROR   :  Minimum value out of range: nan
ECCODES ERROR   :  GRIB2 simple packing: unable to set values (Encoding invalid)
ECCODES ERROR   :  Unable to set double array 'codedValues' (Encoding invalid)
2024-10-02 04:41:17,883 ERROR Error setting values
2024-10-02 04:41:17,883 ERROR Encoding invalid
Traceback (most recent call last):
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/earthkit/data/readers/grib/codes.py", line 221, in set_values
    eccodes.codes_set_values(self._handle, values.flatten())
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/gribapi/gribapi.py", line 2126, in grib_set_values
    grib_set_double_array(gribid, "values", values)
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/gribapi/gribapi.py", line 1200, in grib_set_double_array
    GRIB_CHECK(lib.grib_set_double_array(h, key.encode(ENC), a, length))
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/gribapi/gribapi.py", line 232, in GRIB_CHECK
    errors.raise_grib_error(errid)
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/gribapi/errors.py", line 381, in raise_grib_error
    raise ERROR_MAP[errid](errid)
gribapi.errors.EncodingError: Encoding invalid
2024-10-02 04:41:17,886 INFO Saving output data: 0.9 second.
2024-10-02 04:41:17,886 INFO Total time: 1 minute 26 seconds.
Traceback (most recent call last):
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/ai_models/outputs/__init__.py", line 62, in write
    handle, path = self.output.write(data, *args, **kwargs)
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/earthkit/data/readers/grib/output.py", line 390, in write
    handle = self._coder.encode(
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/earthkit/data/readers/grib/output.py", line 132, in encode
    handle.set_values(values)
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/earthkit/data/readers/grib/codes.py", line 221, in set_values
    eccodes.codes_set_values(self._handle, values.flatten())
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/gribapi/gribapi.py", line 2126, in grib_set_values
    grib_set_double_array(gribid, "values", values)
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/gribapi/gribapi.py", line 1200, in grib_set_double_array
    GRIB_CHECK(lib.grib_set_double_array(h, key.encode(ENC), a, length))
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/gribapi/gribapi.py", line 232, in GRIB_CHECK
    errors.raise_grib_error(errid)
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/gribapi/errors.py", line 381, in raise_grib_error
    raise ERROR_MAP[errid](errid)
gribapi.errors.EncodingError: Encoding invalid

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/bin/ai-models", line 8, in <module>
    sys.exit(main())
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/ai_models/__main__.py", line 362, in main
    _main(sys.argv[1:])
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/ai_models/__main__.py", line 310, in _main
    run(vars(args), unknownargs)
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/ai_models/__main__.py", line 335, in run
    model.run()
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/ai_models_graphcast/model.py", line 232, in run
    save_output_xarray(
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/ai_models_graphcast/output.py", line 68, in save_output_xarray
    write(
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/ai_models/model.py", line 120, in write
    self.output.write(*args, **kwargs, **self.grib_extra_metadata),
  File "/gpfs/home/bsc/bsc927078/graphcast_snake/lib/python3.10/site-packages/ai_models/outputs/__init__.py", line 67, in write
    raise ValueError(f"NaN values found in field. args={args} kwargs={kwargs}")
ValueError: NaN values found in field. args=() kwargs={'template': GribField(2t,None,20230101,1200,0,0), 'step': 6}
russ-schumacher commented 1 week ago

Having the same issue here!

@russ-schumacher which command did you use to generate the NetCDF output?

If you add the "--debug" flag when running, it will output a netcdf of the model output, as well as the input training matrix, and a couple others.