EnergieID / entsoe-py

Python client for the ENTSO-E API (european network of transmission system operators for electricity)
MIT License
421 stars 186 forks source link

time zone inconsistency with UK spot prices #26

Closed gmohandas closed 5 years ago

gmohandas commented 5 years ago

I suspect that there might be a time zone inconsistency with the results of the UK spot price query (may possibly extend to other UK variables). When the timezone is set to 'Europe/Berlin' or 'CET', the returned pandas series datetime index does not match that on the ENTSOE website when displayed in the equivalent timezone.

The following code snippet

country_code = 'GB'
start = pd.Timestamp('20190614', tz='Europe/Berlin')
end = pd.Timestamp('20190615', tz='Europe/Berlin')
dap = client.query_day_ahead_prices(country_code, start=start, end=end)

returns the following ukspotentsoepy

This does not match that on the website ukentsoeweb

JrtPec commented 5 years ago

(First: Weirdly enough, when I try the request above, the last time is 2019-06-14 23:00:00+01:00.)

Now, what you're seeing isn't wrong, but I admit it is confusing.

You can do requests in any time zone you want, but the returned data will always be in the timezone of the country you've requested. In this case, UK summer time which is at UTC+1.

On the website, you're viewing in "CET (UTC+1) / CEST (UTC+2)". Since in June, Berlin is in summer time, CEST, all data is shown in UTC+2.

To get your dataframe to look the same, simply call dap.tz_convert('Europe/Berlin')

gmohandas commented 5 years ago

It appears that if the latest spot price is queried, the last timestamp is for the hour 22:00:00 instead of 23:00:00. However, if you query for a past day spot price, the last timestamp is for the hour 23:00:00. This might explain why you got 2019-06-14 23:00:00 + 01:00 and I did not.

It is also strange that the datetime index of the returned pandas series has utc offset of +01:00 when in fact it should be +02:00 since the start and end timestamps are time-zone aware to begin with.

JrtPec commented 5 years ago

Sure, that also makes sense. I thought about it when I designed the code, you've got three options:

  1. All returned data is in the timezone of the country you're searching for.
  2. All returned data is in UTC.
  3. All returned data is the timezone of the input timestamps.

Option 3 raises some other questions, like:

It's a thing of predictability basically: what would you expect in the most common use case? What causes the fewest side effects?

gmohandas commented 5 years ago

I'd have thought that option 1 is the best approach. Unfortunately, exchanges like Nordpool or Epex do not adhere to this convention as they generally report all prices in CET/CEST. For example, nordpoolspot lists spot prices for Eastern European areas in CET/CEST and epexspot lists UK, Ireland spot prices in CET/CEST.

I agree that this is a tricky one to decide. All the above options will have their supporters and critics. However, if the majority of the users of the library are likely accustomed to seeing the prices as presented in nordpoolspot or epexspot, then it may serve them better to retrieve all queries in CET/CEST by default.