CLIMADA-project / climada_petals

See https://github.com/CLIMADA-project/climada_python first
GNU General Public License v3.0
22 stars 13 forks source link

Flood of cast warnings after improved hdf5 I/O #84

Open emanuel-schmid opened 1 year ago

emanuel-schmid commented 1 year ago

climada_python PR #735 started a flood of Runtime Warnings in tc_track_forecast.TCForecast.from_hdf5.

To reproduce, run:

python -m unittest climada_petals.hazard.test.test_tc_tracks_forecast.TestECMWF.test_hdf5_io

...bash
xarray\coding\times.py:254: RuntimeWarning: invalid value encountered in cast
  flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
...

The warnings are raised in climada.hazard.tc_tracks.TCTracks.from_hdf5:

    def from_hdf5(cls, file_name):
        _raise_if_legacy_or_unknown_hdf5_format(file_name)
        ds_combined = xr.open_dataset(file_name)

        for varname in ds_combined.data_vars:
            if ds_combined[varname].dtype == "object":
                ds_combined[varname] = ds_combined[varname].astype(str)
        data = []
        for i in range(ds_combined.sizes["storm"]):
### most warnings are raised here #################################################
            track = (
                ds_combined
                .isel(storm=i)
                .dropna(dim="step", how="any", subset=["time", "lat", "lon"])
            )
###################################################################################
            track = track.drop_vars(["storm", "step"]).rename(step="time")
            track = track.assign_coords(time=track["time"]).compute()
            attr_vars = [v for v in track.data_vars if track[v].ndim == 0]
            track = (
                track
                .assign_attrs({v: track[v].item() for v in attr_vars})
                .drop_vars(attr_vars)
            )
            track.attrs['orig_event_flag'] = bool(track.attrs['orig_event_flag'])
            data.append(track)
        return cls(data)

I have not really a clue whether we can safely ignore it or whether this points to a serious problem. 🤷

tovogt commented 1 year ago

Yes, it's a known issue in xarray (see https://github.com/pydata/xarray/pull/7098, and https://github.com/pydata/xarray/pull/7827). The warnings can be ignored. I use the following to deal with it:

import warnings
warnings.filterwarnings(
    "ignore",
    message="invalid value encountered in cast",
    module="xarray",
    category=RuntimeWarning,
)
emanuel-schmid commented 1 year ago

Thanks a lot! 🙌 I'll do the same here.

emanuel-schmid commented 1 year ago

I've made an attempt and suppressed the warnings in the parent classes' hdf5 methods. see https://github.com/CLIMADA-project/climada_python/pull/742 Not sure whether it's the perfect place. Perhaps we should suppress it once for all? but perhaps not? and if - how?

tovogt commented 1 year ago

I get the same warning when loading any other NetCDF file, it's not only related to the new implementation of TCTracks.from_hdf5. For example, when loading the IBTrACS NetCDF data:

>>> from climada.hazard import TCTracks
>>> TCTracks.from_ibtracs_netcdf(year_range=(2015, 2015))
$CONDA_PREFIX/lib/python3.9/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast
  flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(
$CONDA_PREFIX/lib/python3.9/site-packages/xarray/coding/times.py:254: RuntimeWarning: invalid value encountered in cast
  flat_num_dates_ns_int = (flat_num_dates * _NS_PER_TIME_DELTA[delta]).astype(

So, I would consider suppressing the warnings at a central place during CLIMADA setup and maybe mention https://github.com/pydata/xarray/pull/7098 in the code so that future developers know where this comes from, and have an easy benchmark when to remove it.