Open tommylees112 opened 4 years ago
Just to avoid confusion and looking at the preprocessed data:
To select by forecast horizon
# map forecast horizons to months ahead
map_ = {
pd.Timedelta('28 days 00:00:00'): 1,
pd.Timedelta('29 days 00:00:00'): 1,
pd.Timedelta('30 days 00:00:00'): 1,
pd.Timedelta('31 days 00:00:00'): 1,
pd.Timedelta('59 days 00:00:00'): 2,
pd.Timedelta('60 days 00:00:00'): 2,
pd.Timedelta('61 days 00:00:00'): 2,
pd.Timedelta('62 days 00:00:00'): 2,
pd.Timedelta('89 days 00:00:00'): 3,
pd.Timedelta('90 days 00:00:00'): 3,
pd.Timedelta('91 days 00:00:00'): 3,
pd.Timedelta('92 days 00:00:00'): 3,
}
fhs = [pd.Timedelta(fh) for fh in stacked.forecast_horizon.values]
months = [map_[fh] for fh in fhs]
stacked = stacked.assign_coords(months_ahead=('time', months))
Which we can then use to subset the data as follows:
stacked.loc[dict(time=stacked.months_ahead == 1)].rename({var: var + '_1'})
<xarray.Dataset>
Dimensions: (lat: 45, lon: 35, time: 312)
Coordinates:
* lon (lon) float32 33.75 34.0 34.25 ... 41.75 42.0 42.25
* lat (lat) float32 6.0 5.75 5.5 5.25 ... -4.5 -4.75 -5.0
* time (time) datetime64[ns] 1993-03-03 ... 2019-01-31
valid_time (time) datetime64[ns] dask.array<shape=(312,), chunksize=(312,)>
initialisation_date (time) datetime64[ns] 1993-01-31 ... 2018-12-31
forecast_horizon (time) timedelta64[ns] 31 days 28 days ... 31 days
months_ahead (time) int64 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
Data variables:
tprate_1 (lat, lon, time) float64 dask.array<shape=(45, 35, 312), chunksize=(45, 35, 312)>
the s5 data has lots of nuances
so in order to be a complete xarray object there are 4 different
forecast_horizons
for each month so that the only array with non-nan values is when thevalid_time
is the first of the next month.the
forecast_horizon
is set to a size of 9 (28days, 30days, 31days + 30 + 60 ...) which is the way to code n month forecasts (1month = 28/30/31 depending on the length of the month).So these forecasts are all initialised on
2014-01-01
and January is 31 days long. So the only valid data comes from the 3rd indexedforecast_horizon
(here coded asstep
) which corresponds to 31 days. This takes thevalid_time
to2014-02-01
which is the month ahead. Whereas February/March are 59 days long total. So the 4th indexedforecast_horizon[3]
corresponds to 59 days which is the valid step here (valid_time
==2014-03-01
)We need a way to get the valid timesteps so that we have a complete xarray object with data at each timestep. I can think of a few ways but we could document them here.