ecmwf / earthkit-transforms

Apache License 2.0
1 stars 1 forks source link

Cannot run daily_mean() on forecast data #30

Open sandorkertesz opened 1 week ago

sandorkertesz commented 1 week ago

What happened?

First, I generate Xarray from GRIB forecast data with earthkit-data/develop as follows:

import earthkit.data as ekd
ds_fl = ekd.from_source("sample", "pl.grib") 
ds_xr = ds_fl.to_xarray()
print(ds_xr)
<xarray.Dataset>
Dimensions:                  (forecast_reference_time: 4, step: 2, levelist: 2,
                              latitude: 19, longitude: 36)
Coordinates:
  * forecast_reference_time  (forecast_reference_time) datetime64[ns] 2024-06...
  * step                     (step) timedelta64[ns] 00:00:00 06:00:00
  * levelist                 (levelist) int64 500 700
  * latitude                 (latitude) float64 90.0 80.0 70.0 ... -80.0 -90.0
  * longitude                (longitude) float64 0.0 10.0 20.0 ... 340.0 350.0
Data variables:
    r                        (forecast_reference_time, step, levelist, latitude, longitude) float64 ...
    t                        (forecast_reference_time, step, levelist, latitude, longitude) float64 ...
Attributes:
...

Next, when I use daily_mean() on it it fails:

from earthkit.transforms import aggregate
r = aggregate.temporal.daily_mean(ds_xr)
File ~/git/earthkit-transforms/earthkit/transforms/tools.py:69, in time_dim_decorator.<locals>.wrapper(dataarray, time_dim, time_shift, remove_partial_periods, *args, **kwargs)
     65     time_coord = time_coord.assign_attrs({"time_shift": f"{time_shift}"})
     67     dataarray = dataarray.assign_coords({time_dim: time_coord})
---> 69 result = func(dataarray, *args, time_dim=time_dim, **kwargs)
     71 # If we want only full days then chop off the first and last day if we have time_shift
     72 if remove_partial_periods and time_shift and isinstance(result, (xr.Dataset, xr.DataArray)):

File ~/git/earthkit-transforms/earthkit/transforms/aggregate/temporal.py:412, in daily_reduce(dataarray, how, time_dim, **kwargs)
    410     group_key = "days"
    411 else:
--> 412     raise TypeError(f"Invalid type for time dimension ({time_dim}): {dataarray[time_dim].dtype}")
    414 grouped_data = groupby_time(dataarray, time_dim=time_dim, frequency=group_key)
    415 # If how is string and inbuilt method of grouped_data, we apply

TypeError: Invalid type for time dimension (t): float64

What are the steps to reproduce the bug?

See above

Version

main

Platform (OS and architecture)

Any

Relevant log output

No response

Accompanying data

No response

Organisation

ECMWF

EddyCMWF commented 5 days ago

I changed to enhancement as it is currently possible to do it, you just need to be explicit in defining the time dimension when it is not simple to detect it. In this case, the grib file is not helping with non standard time dimensions and "t" for variable name.

from earthkit.transforms import aggregate
r = aggregate.temporal.daily_mean(ds_xr, time_dim="forecast_reference_time")
r
sandorkertesz commented 5 days ago

Thank you. Is "forecast_reference_time" really non standard?