PyPSA / atlite

Atlite: A Lightweight Python Package for Calculating Renewable Power Potentials and Time Series
https://atlite.readthedocs.io
264 stars 89 forks source link

Time misalignment between ERA5 and SARAH? #355

Open matzech opened 1 month ago

matzech commented 1 month ago

Version Checks (indicate both or one)

Issue Description

Hi,

I think there may be a time misalignment in the current implementation when working with instantaneous (satellite) data. As correctly written and considered (e.g. here: https://github.com/PyPSA/atlite/blob/master/atlite/datasets/era5.py#L173-L175), ERA5 takes as reference time the accumulated values of the last hour meaning 11:00 refers to 10:00-11:00.

Now, in the Sarah implementation you take the mean of the arrays at 11:00 and 11:30 and assign the time index of the first array (11:00): https://github.com/PyPSA/atlite/blob/master/atlite/datasets/sarah.py#L153-L156 which leads to the 1-hour time misalignment. See e.g. the spatially averaged values (GHI) for a June day : image

So if I did not overlook anything and this bug is true, the only change required would be: ds = ds.assign_coords(time=ds.indexes["time"] + pd.Timedelta(60, "m")) after merging the data with the solar position (https://github.com/PyPSA/atlite/blob/master/atlite/datasets/sarah.py#L237)

I could fix this in the sarah3 compatibility pull request (https://github.com/PyPSA/atlite/pull/352) if required.

Reproducible Example

No response

Expected Behavior

No response

Installed Versions

Replace this line.
FabianHofmann commented 1 month ago

Mmmh, I have the concern that you are right. The point is that the hourly mean function in the sarah module it "wrong". As far as I see, it should be + intead of - in https://github.com/PyPSA/atlite/blob/1b3a3c0908538a178997a4991f9c4c062f8612fe/atlite/datasets/sarah.py#L155, right?

matzech commented 1 month ago

Yes, indeed. Although I think that changing this line to '+' would mess up the calculation of the solar position, right? https://github.com/PyPSA/atlite/blob/1b3a3c0908538a178997a4991f9c4c062f8612fe/atlite/datasets/sarah.py#L233

So either

FabianHofmann commented 1 month ago

I am not so sure about that. So the convention should be that an indexing hour (assuming hourly resolution) represents the completed hour. So, a value at 11:00 am represents the mean from 10:00 am to 11:00 am. This is how it is handled by era5 and how it was intended by the sarah module (however there is this bug).

Could you explain to what extent the solar position is misaligned? perhaps, the cleanest way is to also take the average between 10:30 and 11:00 for the solar position in this example

Parisra commented 1 month ago

Hi both, Just wanted to comment on this since me and @martavp looked into this issue for an analysis I'm currently doing that involves modeling east-facing and west-facing solar panels. To start, here are two links from the PVGIS that I found helpful for explaining the issue and the problems it could cause: PVGIS documentation note 9.3 PVGIS 5.2 release notes I tried shifting the original cutout after reading it (time shift of '-1 days +23:30:00') to create a new cutout. I tested this new cutout and the azimuth and altitude fit better with the ERA5 cutout.

image

So for me, solar position was misaligned by 30 minutes and what SARAH showed at 8:30 is what ERA5 showed at 8:00.

matzech commented 1 month ago

Thanks for the comments.

Oh, yes, @FabianHofmann . You are right with the solar position.

Speaking of the scan weighting, I think the weighting can also be improved. Taking only 2 values assumes that these two scans approximate the hour reasonably. This means that the average of 10:30 and 11:00 is considered a good estimate of 11:00, but you could argue it only describes the evolution of the half-hour from 10:30 to 11:00.

I think the more accurate way would be to reconsider the weighting of the instantaneous values and adapt them, as for instance described in the dissertation from Annette Hammer (sorry, in German: https://oops.uni-oldenburg.de/317/1/347.pdf, p. 82). Note here also satellite scan times are considered that could be ignored for simplicity.

This (and the time alignment error) could then be solved by

 ds1 = ds.isel(time=slice(None, None, 2))
 ds2 = ds.isel(time=slice(1, None, 2))
 ds3 = ds.isel(time=slice(2, None, 2))

 ds2 = ds2.assign_coords(time=ds2.indexes["time"] + pd.Timedelta(30, "m"))
 ds3 = ds3.assign_coords(time=ds3.indexes["time"] + pd.Timedelta(60, "m"))
 ds = (.25*ds1 + .5*ds2 + .25*ds3)

So this means that for hour 10, we consider 09:00 (1/4), 09:30 (1/2) and 10:00 (1/4).

I can run an evaluation for different in-situ measurements if desired as I am not aware of a publication but it is more like "unpublíshed knowledge" maybe.

For the solar position, this approach could be done analogously.

euronion commented 1 month ago

Thanks all for looking into this!

Some notes from the side lines: