Closed zoltanmaric closed 1 year ago
The culprit is this attempt to infer the frequency of the time index: https://github.com/PyPSA/atlite/blob/a0bd4b06c51269b87c90252bdf6951556f645dac/atlite/datasets/era5.py#L157-L174
An ERA5 CDS request spanning June 30 and July 1st looks like this:
{
'product': 'reanalysis-era5-single-levels',
'year': '2022',
'month': [6, 7],
'day': [30, 1],
'time': ['00:00', '01:00', '02:00', '03:00', '04:00', '05:00', '06:00', '07:00', '08:00', '09:00', '10:00', '11:00', '12:00', '13:00', '14:00', '15:00', '16:00', '17:00', '18:00', '19:00', '20:00', '21:00', '22:00', '23:00']
}
which returns results covering June 1st, June 30th, July 1st, and July 30th:
From [`get_data_influx`](https://github.com/PyPSA/atlite/pull/199/files#diff-0558b068b3e760408c41c929b3de7a0d12939032de82f7de3a6aa7120733cea7R171)
```python
print(ds.time)
```
Output:
```
Because the time index is not continuous, pd.infer_freq
returns None
, resulting in a time shift of minus 12 hours.
I think the fix should be as simple as just always setting the time shift to 30 minutes (proposed diff).
The get_data
function of era5.py
already has the reanalysis-era5-single-levels
hard-coded as the product:
https://github.com/PyPSA/atlite/blob/a0bd4b06c51269b87c90252bdf6951556f645dac/atlite/datasets/era5.py#L352-L359
To my understanding, this product's resolution is always hourly (see docs), so there's no reason to attempt to infer a different frequency.
What do you think @euronion @FabianHofmann ?
Here's a comparison of values received from ERA5 at 1h, 2h, 3h, 4h, and 6h sampling:
```python import xarray import atlite.datasets.era5 as era5 from dask.utils import SerializableLock import functools # Create lists of hours like [00:00, 01:00, 02:00, ...], sampled # every hour, every 2 hours, etc. time_sampling = {} for rate in [1, 2, 3, 4, 6]: time_sampling[rate] = [f"{hour:02}:00" for hour in range(0, 24, rate)] retrieval_params = { 'product': 'reanalysis-era5-single-levels', 'area': [57.0, -0.5, 56.0, 0.5], 'chunks': {'time': 100}, 'grid': [0.25, 0.25], 'tmpdir': '/tmp', 'lock': SerializableLock(), 'year': '2013', 'month': [1], 'day': [1] } param_sets = {hour: {**retrieval_params, **{'time': time}} for hour, time in time_sampling.items()} def retrieve_data_for_single_raster(params: dict) -> "xarray.DataSet": variable = [ "surface_net_solar_radiation", "surface_solar_radiation_downwards", "toa_incident_solar_radiation", "total_sky_direct_solar_radiation_at_surface", ] ds = era5.retrieve_data(variable=variable, **params) return ds.sel(latitude=56, longitude=0).load().to_dataframe()[["ssr", "ssrd", "tisr", "fdir"]] # Retrieve ERA5 data for each different time sampling raw_ds = {hour: retrieve_data_for_single_raster(params) for hour, params in param_sets.items()} def join_dfs(left_sampling_and_df, right_sampling_and_df): left_sampling, left_df = left_sampling_and_df right_sampling, right_df = right_sampling_and_df suffix = f"_{right_sampling}h" return left_sampling, left_df.join(right_df, on='time', how='left', rsuffix=suffix) # Merge all data into a single dataframe to show differences in values # for each hour next to each other _, merged = functools.reduce(join_dfs, raw_ds.items()) samplings_compared = merged.sort_index(axis='columns')\ .query('time.dt.hour > 7 and time.dt.hour < 18')\ .astype(float)\ .round(decimals=2) # Remove date part samplings_compared.index = samplings_compared.index.time samplings_compared.to_csv('/tmp/samplings_compared.csv') ```
While this does show that the time shift is not proportional to the sampling, there's an additional weird twist to it. The values for 2h, 3h, 4h, and 6h sampling seem to be equal - but the values for 1h sampling are slightly different (by less than 1% at noon).
It would be interesting to find out why that is, but as far as this issue is concerned - I think it's still more appropriate to always shift the time by 30 minutes, than by half of the sampling interval.
@zoltanmaric I am impressed how fast you are catching up with the atlite code! Definitely makes sense!
Interesting, but I would not bother with the value differences for different time resolutions, since it is related to ERA5 internals only.
Description
When building an ERA5 cutout for a period that doesn't cover entire months (e.g. for
time=slice("2022-06-30 00:00", "2022-07-01 23:00")
), the solar position is shifted by half a day instead of half an hour - effectively inverting the solar position (maximum at local midnight, minimum at local noon).Expected Behavior
Solar position should be shifted by 30 minutes.
Actual Behavior
Solar position is shifted by 12 hours.
Example
Your Environment
atlite
at revision 230aa8a5b1b21bff8f03d23631f01e6ebf5d83b3Related to #158 and #199