ecmwf-lab / ai-models

Apache License 2.0
365 stars 55 forks source link

How to use the --file option with graphcast #15

Open idharssi2020 opened 11 months ago

idharssi2020 commented 11 months ago

Dear Developers,

Thanks so much for making your ai-models wrapper available. I have graphcast running with the wrapper. Next I would like to use a local analysis created using 4DEnVar to initialise graphcast. The analysis is Global and has a resolution of about 12km. Data is available at all the required pressure levels.

Should the source file be grib or netCDF?

Do I need to regrid the analyses to 0.25 degrees resolution? I can use CDO to do this but just wondered if it is necessary.

Also, graphcast needs analyses at two time periods, 6 hours apart. What time-stamps should I put on the input file?

Apologies if this is all in the documentation. I might figure all this out by trial and error but any extra guidance is much appreciated.

Thanks

idharssi2020 commented 11 months ago

KeyError: 'msl0'

I've created initial conditions for graphcast (GC) using files download from https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/ I use eccodes to extract the required fields for 0Z and 6Z

I run GC using ai-models --input file --file dump.grib --date $1 --time 0600 --expver gfs1 graphcast --debug 2> log.txt >log.txt

I get the following errors

tail -20 log.txt
    sys.exit(main())
  File "/home/548/ixd548/.conda/envs/ai_models0211/lib/python3.10/site-packages/ai_models/__main__.py", line 285, in main
    _main()
  File "/home/548/ixd548/.conda/envs/ai_models0211/lib/python3.10/site-packages/ai_models/__main__.py", line 258, in _main
    model.run()
  File "/g/data/dp9/ixd548/ai-models/ai-models-graphcast/ai_models_graphcast/model.py", line 248, in run
    save_output_xarray(
  File "/g/data/dp9/ixd548/ai-models/ai-models-graphcast/ai_models_graphcast/output.py", line 34, in save_output_xarray
    all_fields = all_fields.order_by(
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 210, in order_by
    indices = sorted(indices, key=functools.cmp_to_key(cmp))
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 207, in cmp
    return order.compare_elements(self[i], self[j])
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 87, in compare_elements
    n = v(a_metadata(k), b_metadata(k))
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 120, in __call__
    return ascending(self.get(a), self.get(b))
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 123, in get
    return self.order[x]
KeyError: 'msl0'

I think the issue is the way surface levels are represented in the input file dump.grib. If I use grib_ls

grib_ls dump.grib

dump.grib
edition      centre       date         dataType     gridType     stepRange    typeOfLevel  level        shortName    packingType  
...

2            kwbc         20231013     fc           regular_ll   0            isobaricInhPa  1000         z            grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            surface      0            lsm          grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            surface      0            z            grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            surface      0            tp           grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            meanSea      0            msl          grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            heightAboveGround  10           10u          grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            heightAboveGround  10           10v          grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            heightAboveGround  2            2t           grid_complex_spatial_differencing 

I tried to use grib_set -s typeOfLevel=surface,level=0 , but this changes the variable names from msl,2t,10u,10v to sp,t,u,v

If I use grib_set -s shortName= , the level and type are reset back to the old values

Would it be possible to change the code so that the level height isn't used as part of the key for surface variables?

It looks as though GC finished as a file output.nc is generated du -sh * 1.2G 00 1.2G 06 151M dump.grib 28K dump.txt 159M forcings_xr.nc 674M input_xr.nc 116K log.txt 13G output.nc 0 params 0 stats 14G training_xarray.nc

idharssi2020 commented 11 months ago

I commented out the all_fields.order_by and changed the if level != 0 to if level > 20. My modified code looks like

def save_output_xarray(
    *,
    output,
    target_variables,
    write,
    all_fields,
    ordering,
    lead_time,
    hour_steps,
    lagged,
):
    LOG.info("Converting output xarray to GRIB and saving")

    output["total_precipitation_6hr"] = output.data_vars[
        "total_precipitation_6hr"
    ].cumsum(dim="time")

   # all_fields = all_fields.order_by(
   #     valid_datetime="descending",
   #     param=ordering,
   #     #remapping={"param_level": "{param}{levelist}"},
   # )

    for time in range(lead_time // hour_steps):
        for fs in all_fields[: len(all_fields) // len(lagged)]:
            param, level = fs["shortName"], fs["level"]

            if level > 20:
                param = GRIB_TO_XARRAY_PL.get(param, param)
                if param not in target_variables:
                    continue
                values = output.isel(time=time).sel(level=level).data_vars[param].values
            else:
                param = GRIB_TO_CF.get(param, param)
                param = GRIB_TO_XARRAY_SFC.get(param, param)
                if param not in target_variables:
                    continue
                values = output.isel(time=time).data_vars[param].values

            # We want to field north=>south

            values = np.flipud(values.reshape(fs.shape))

            write(
                values,
                template=fs,
                step=(time + 1) * hour_steps,
            )
whu-dyf commented 11 months ago

KeyError: 'msl0'

I've created initial conditions for graphcast (GC) using files download from https://nomads.ncep.noaa.gov/pub/data/nccf/com/gfs/prod/ I use eccodes to extract the required fields for 0Z and 6Z

I run GC using ai-models --input file --file dump.grib --date $1 --time 0600 --expver gfs1 graphcast --debug 2> log.txt >log.txt

I get the following errors

tail -20 log.txt
    sys.exit(main())
  File "/home/548/ixd548/.conda/envs/ai_models0211/lib/python3.10/site-packages/ai_models/__main__.py", line 285, in main
    _main()
  File "/home/548/ixd548/.conda/envs/ai_models0211/lib/python3.10/site-packages/ai_models/__main__.py", line 258, in _main
    model.run()
  File "/g/data/dp9/ixd548/ai-models/ai-models-graphcast/ai_models_graphcast/model.py", line 248, in run
    save_output_xarray(
  File "/g/data/dp9/ixd548/ai-models/ai-models-graphcast/ai_models_graphcast/output.py", line 34, in save_output_xarray
    all_fields = all_fields.order_by(
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 210, in order_by
    indices = sorted(indices, key=functools.cmp_to_key(cmp))
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 207, in cmp
    return order.compare_elements(self[i], self[j])
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 87, in compare_elements
    n = v(a_metadata(k), b_metadata(k))
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 120, in __call__
    return ascending(self.get(a), self.get(b))
  File "/home/548/ixd548/.local/lib/python3.10/site-packages/climetlab/core/index.py", line 123, in get
    return self.order[x]
KeyError: 'msl0'

I think the issue is the way surface levels are represented in the input file dump.grib. If I use grib_ls

grib_ls dump.grib

dump.grib
edition      centre       date         dataType     gridType     stepRange    typeOfLevel  level        shortName    packingType  
...

2            kwbc         20231013     fc           regular_ll   0            isobaricInhPa  1000         z            grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            surface      0            lsm          grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            surface      0            z            grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            surface      0            tp           grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            meanSea      0            msl          grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            heightAboveGround  10           10u          grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            heightAboveGround  10           10v          grid_complex_spatial_differencing 
2            kwbc         20231013     fc           regular_ll   0            heightAboveGround  2            2t           grid_complex_spatial_differencing 

I tried to use grib_set -s typeOfLevel=surface,level=0 , but this changes the variable names from msl,2t,10u,10v to sp,t,u,v

If I use grib_set -s shortName= , the level and type are reset back to the old values

Would it be possible to change the code so that the level height isn't used as part of the key for surface variables?

It looks as though GC finished as a file output.nc is generated du -sh * 1.2G 00 1.2G 06 151M dump.grib 28K dump.txt 159M forcings_xr.nc 674M input_xr.nc 116K log.txt 13G output.nc 0 params 0 stats 14G training_xarray.nc

Hello, could you please provide me with more information on how to initialize GraphCast using GFS products? I am interested in utilizing GFS products for real-time data analysis, but I am unsure where to begin. Your assistance in refining the above statements would be greatly appreciated.

I-Dhar commented 11 months ago

I use eccodes command line tools (https://confluence.ecmwf.int/display/ECC/GRIB+tools+examples) to extract and process the NCEP GFS analyses which are already in grib2 format and on pressure levels. I needed to convert some units and so scale some of the GFS fields. I also needed to rename some of the variables. My script is only 27 lines long. I'm still checking the script and will share it when it is ready.

whu-dyf commented 11 months ago

I use eccodes command line tools (https://confluence.ecmwf.int/display/ECC/GRIB+tools+examples) to extract and process the NCEP GFS analyses which are already in grib2 format and on pressure levels. I needed to convert some units and so scale some of the GFS fields. I also needed to rename some of the variables. My script is only 27 lines long. I'm still checking the script and will share it when it is ready.

Thanks! Looking forward to your updates!

Kinggithubbj commented 8 months ago

I download 4 grib files (06_plev, 06_surface, 12_plev, 12_surface) and use them to run graphcast as below: "ai-models --input file --file graphcast_intput_20230909_06_plev.grib graphcast_intput_20230909_06_surface.grib graphcast_intput_20230909_12_plev.grib graphcast_intput_20230909_12_surface.grib graphcast" the error is : [ ai-models: error: argument MODEL: invalid choice: 'graphcast_intput_20230909_06_surface.grib' (choose from 'graphcast') ]

should I combine the 4 files to 1 file? Thanks! I combine the 4 files to 1 file (graphcast_intput_20231001.grib) and run as: ai-models --input file --file graphcast_intput_20231001.grib graphcast

the error is

2024-01-09 19:40:41,159 INFO Loading params/GraphCast_operational - ERA5-HRES 1979-2021 - resolution 0.25 - pressure levels 13 - mesh 2to6 - precipitation output only.npz: 0.3 second. 2024-01-09 19:40:41,159 INFO Building model: 0.3 second. 2024-01-09 19:40:41,565 INFO Creating training data: 0.4 second. 2024-01-09 19:40:41,565 INFO Creating input data (total): 0.4 second. 2024-01-09 19:40:41,566 INFO Total time: 1 second. Traceback (most recent call last): File "/data7/01_ai_models/miniconda/envs/ai-models/bin/ai-models", line 8, in sys.exit(main()) File "/data7/01_ai_models/miniconda/envs/ai-models/lib/python3.10/site-packages/ai_models/main.py", line 291, in main _main() File "/data7/01_ai_models/miniconda/envs/ai-models/lib/python3.10/site-packages/ai_models/main.py", line 264, in _main model.run() File "/data7/01_ai_models/miniconda/envs/ai-models/lib/python3.10/site-packages/ai_models_graphcast/model.py", line 205, in run start_date=self.start_date, File "/data7/01_ai_models/miniconda/envs/ai-models/lib/python3.10/functools.py", line 981, in get val = self.func(instance) File "/data7/01_ai_models/miniconda/envs/ai-models/lib/python3.10/site-packages/ai_models_graphcast/model.py", line 192, in start_date return self.all_fields.order_by(valid_datetime="descending")[0].datetime File "/data7/01_ai_models/miniconda/envs/ai-models/lib/python3.10/site-packages/climetlab/core/index.py", line 210, in order_by indices = sorted(indices, key=functools.cmp_to_key(cmp)) File "/data7/01_ai_models/miniconda/envs/ai-models/lib/python3.10/site-packages/climetlab/core/index.py", line 207, in cmp return order.compare_elements(self[i], self[j]) File "/data7/01_ai_models/miniconda/envs/ai-models/lib/python3.10/site-packages/climetlab/core/index.py", line 87, in compare_elements n = v(a_metadata(k), b_metadata(k)) File "/data7/01_ai_models/miniconda/envs/ai-models/lib/python3.10/site-packages/climetlab/readers/grib/codes.py", line 505, in metadata return datetime.datetime( ValueError: year 0 is out of range

the input files are downloaded from mars, I transfer the cache file to grib file.

rfovell commented 8 months ago

I have run graphcast successfully, using ERA5 reanalysis fields obtained from CDS. To test out the --file option, I have tried giving the same [albeit concatenated and renamed] CDS files back to graphcast, but am running into a problem.

The successful run obtained these files from CDS: cds-retriever-fc80dd0245970d72ee767b0af3106647ef559ca2d5e4bb2e0fd32d4e9b39f1c2.cache cds-retriever-874becd411c61b19487676bd6137133b4abd69f531f552aca9d42af8c96bf817.cache

I moved them to my ai-models directory, combined them into a single file (via unix cat), and renamed it as "combined_file.grib". Then tried to run graphcast as

ai-models --file combined_file.grib --date 20211230 --time 12 --path 'out-{step}.grib' --lead-time 24 --debug graphcast

... which resulted in this error

2024-01-09 13:08:57,623 INFO Creating input data (total): 0.4 second.
2024-01-09 13:08:57,623 INFO Total time: 2 seconds.
Traceback (most recent call last):
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/bin/ai-models", line 8, in <module>
    sys.exit(main())
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/ai_models/__main__.py", line 297, in main
    _main()
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/ai_models/__main__.py", line 270, in _main
    model.run()
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/ai_models_graphcast/model.py", line 201, in run
    training_xarray, time_deltas = create_training_xarray(
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/ai_models_graphcast/input.py", line 84, in create_training_xarray
    forcing_numpy = forcing_variables_numpy(
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/ai_models_graphcast/input.py", line 49, in forcing_variables_numpy
    ds = cml.load_source(
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/climetlab/sources/__init__.py", line 178, in load_source
    src = get_source(name, *args, **kwargs)
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/climetlab/sources/__init__.py", line 159, in __call__
    source = klass(*args, **kwargs)
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/climetlab/core/__init__.py", line 25, in __call__
    obj.__init__(*args, **kwargs)
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/climetlab/sources/constants.py", line 269, in __init__
    self.numbers = find_numbers(source_or_dataset)
  File "/network/rit/lab/fovelllab_rit/anaconda3/envs/ai/lib/python3.10/site-packages/climetlab/sources/constants.py", line 234, in find_numbers
    return source_or_dataset.unique_values(
KeyError: 'number'

Any ideas? Thanks!

3atshan commented 8 months ago

@I-Dhar

I use eccodes command line tools (https://confluence.ecmwf.int/display/ECC/GRIB+tools+examples) to extract and process the NCEP GFS analyses which are already in grib2 format and on pressure levels. I needed to convert some units and so scale some of the GFS fields. I also needed to rename some of the variables. My script is only 27 lines long. I'm still checking the script and will share it when it is ready.

Curious if you figured it out?