contrailcirrus / pycontrails

Python library for modeling contrails and other aviation climate impacts
https://py.contrails.org/
Apache License 2.0
59 stars 17 forks source link

ERA5 downloading and preprocessing with new CDSAPI-BETA #232

Closed Reyrem closed 3 months ago

Reyrem commented 3 months ago

ERA5 downloading and preprocessing with new CDSAPI-BETA

CDSAPI changed plateform, the ancien one will be decommissioned beginning of september. The old plateform is now subject to very long waiting time.

The time coordinate in the files returned by the new api is now called "valid_time". The level dimension is now called "pressure_level". This causes disruption when using pycontrails to download, preprocess and cache weather data.

It seems that the latitude and longitude dimensions might be affected as well.

Steps to Reproduce

  1. follow ecwmf procedure to migrate to cdsapi beta
  2. run pycontrail era5 downloading methods as usual

Additional Notes

A quick and dirty fix could be to go to /envs/pycontrail/lib/python3.10/site-packages/pycontrails/datalib/ecmwf/era5.py and modify the _download_file function by adding these two lines :

self.cds.retrieve(self.dataset, request, cds_temp_filename)

# open file, edit, and save for each hourly time step ds = stack.enter_context( xr.open_dataset(cds_temp_filename, engine=metsource.NETCDF_ENGINE) ) ds = ds.swap_dims ({'valid_time' : 'time', 'pressure_level' : 'level'}) ds = ds.rename_vars({'valid_time' : 'time', 'pressure_level' : 'level'})

# run preprocessing before cache ds = self._preprocess_era5_dataset(ds)

and for some reason I also needed to change a line in /envs/pycontrail/lib/python3.10/site-packages/pycontrails/datalib/ecmwf/common.py, in function cache_dataset

line 98 goes from
cache_path = self.create_cachepath(pd.Timestamp(t).to_pydatetime()) to cache_path = self.create_cachepath(pd.Timestamp(ds_t.time.values[0]).to_pydatetime())

Brian75321 commented 3 months ago

Hello,

Thank you for these suggestions. After I implement them, I no longer get the "KeyError: 'Time'" message. However, I now get the following error message:

KeyError: 'level'

Please help.

Thank you, Brian

Reyrem commented 3 months ago

if you are trying to download files from the surface weather dataset, maybe you will have a problem because there might not be a dimension "level".

try adding :

if "level" not in ds.dims and len(self.pressure_levels) == 1:
            ds = ds.expand_dims(level=self.pressure_levels)

or something similar just after opening the ds xarray in era5.py

Hope it helps. otherwise, I saw people opening another ticket related to this topic :

233

Brian75321 commented 3 months ago

Thank you for the quick feedback. I have tried adding the lines you suggested, and I still get the "KeyError: level" message.

I am essentially a newbie trying to get pycontrails to work - trying to run through some of the notebook examples. I have some experience using VB.NET (so some coding experience) but no Python experience.

As a test, I am trying to get this code below to run. It is a "shortened" version of the suggested code on the pycontrails notebooks page: https://py.contrails.org/notebooks.html

Thanks again for any help you can provide, Brian

FROM PYCONTRAILS NOTEBOOKS PAGE:

from pycontrails.datalib.ecmwf import ERA5

time = ("2022-03-01 00:00:00", "2022-03-01 01:00:00") pressure_levels = [350, 300] met_variables = ["t", "q"] rad_variables = ["tsr", "ttr"]

ERA5(time=time, variables=met_variables, pressure_levels=pressure_levels).open_metdataset() ERA5(time=time, variables=rad_variables).open_metdataset()

LINES ADDED TO ERA5.PY

        ds = ds.swap_dims ({'valid_time' : 'time', 'pressure_level' : 'level'})
        ds = ds.rename_vars({'valid_time' : 'time', 'pressure_level' : 'level'})

        if "level" not in ds.dims and len(self.pressure_levels) == 1:
            ds = ds.expand_dims(level=self.pressure_levels)

ERROR MESSAGES:

Traceback (most recent call last): File "C:\Users\brian\anaconda3\Lib\site-packages\xarray\core\dataset.py", line 1393, in _construct_dataarray variable = self._variables[name]


KeyError: 'level'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "c:\Users\brian\_Test\Test-Load Full Met", line 8, in <module>
    ERA5(time=time, variables=met_variables, pressure_levels=pressure_levels).open_metdataset()
  File "C:\Users\brian\anaconda3\Lib\site-packages\pycontrails\datalib\ecmwf\era5.py", line 377, in open_metdataset
    self.download(**xr_kwargs)
  File "C:\Users\brian\anaconda3\Lib\site-packages\pycontrails\datalib\_met_utils\metsource.py", line 597, in download
    if times_to_download := self.list_timesteps_not_cached(**xr_kwargs):
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\brian\anaconda3\Lib\site-packages\pycontrails\datalib\_met_utils\metsource.py", line 623, in list_timesteps_not_cached
    return [t for t in self.timesteps if not self.is_datafile_cached(t, **xr_kwargs)]
                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\brian\anaconda3\Lib\site-packages\pycontrails\datalib\_met_utils\metsource.py", line 669, in is_datafile_cached
    return self._check_is_ds_complete(ds, cache_path)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\brian\anaconda3\Lib\site-packages\pycontrails\datalib\_met_utils\metsource.py", line 703, in _check_is_ds_complete
    cond = np.isin(pl, ds["level"].values)
                       ~~^^^^^^^^^
  File "C:\Users\brian\anaconda3\Lib\site-packages\xarray\core\dataset.py", line 1484, in __getitem__
    return self._construct_dataarray(key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\brian\anaconda3\Lib\site-packages\xarray\core\dataset.py", line 1395, in _construct_dataarray
    _, name, variable = _get_virtual_variable(self._variables, name, self.dims)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\brian\anaconda3\Lib\site-packages\xarray\core\dataset.py", line 196, in _get_virtual_variable
    raise KeyError(key)
KeyError: 'level'