calliope-project / euro-calliope

A workflow to build models of the European electricity system for Calliope.
https://euro-calliope.readthedocs.io
MIT License
31 stars 18 forks source link

OPSD load data sources #41

Open brynpickering opened 3 years ago

brynpickering commented 3 years ago

OPSD is the source for load data in Euro-Calliope. It combines data published by ENTSOE and by some individual TSOs, and combines all these sources in the 2019 dataset we are using.

The current method for selecting which load data source to use in Euro-Calliope is aimed at selecting the ENTSOE transparency platform data in preference of other sources (inc. 'power statistics'). This method has issues in two ways:

  1. The code orders the data by source, in reverse alphabetical order. It then removes duplicates, with the expectation that the transparency data comes before the power statistics data. From an inspection of the CSV, this actually places 'load forecast' data first, meaning that the transparency platform data isn't chosen in many instances.
  2. OPSD suggests that power statistics data is preferable. In their documentation they state:

The two sources differ Values on PS (~500 TWh annaually in Germany) are usually slightly higher than on the TP (~490 TWh). The reason probably lies with different reporting deadlines: Values on the TP have to be reported "no later than one hour after the end of the operating period". For the PS, the data is published with a delay of up to 3 months, which might allow for more accurate metering. For a comparison of the two sources see Hirth, et al. (2018).

Based on these two points, I would a. choose the power statistics from the OPSD load data, and b. explicitly define an order of gap filling from the remaining data sources. The order I propose is:

  1. actual_entsoe_power_statistics
  2. actual_entsoe_transparency
  3. actual_tso
  4. actual_net_consumption_tso

NOTE: as of the 2020 dataset, it seems (perhaps erroneously?) that only the transparency platform data is available from the OPSD timeseries dataset. The documentation for the dataset still suggests there should be multiple sources available.

brynpickering commented 3 years ago

Using this proposed method (see #42), there is a reasonable discrepancy in hourly data. This table shows the statistics for the ratio between hourly data, using the old method vs the new method.

min median mean max
ALB 1.000000 1.000000 1.000000 1.000000
AUT 0.647012 0.907906 0.909611 1.094351
BEL 0.813444 1.007763 1.009679 1.176983
BGR 0.809137 0.998001 1.001139 1.265237
BIH 1.000000 1.000000 1.000000 1.000000
CHE 0.669064 0.995464 0.993145 1.427444
CYP 0.696850 1.014709 1.052645 1.972705
CZE 0.890186 0.984160 0.984599 1.080718
DEU 0.778210 0.919918 0.918342 1.121139
DNK 0.469858 0.958940 0.952519 1.178764
ESP 0.927485 1.000276 1.000346 1.124405
EST 0.733867 0.924563 0.944713 1.686290
FIN 0.846991 0.978569 0.977871 1.081461
FRA 0.868517 0.999296 0.998157 1.133234
GBR 0.522319 0.848859 0.856374 1.211727
GRC 0.847037 0.996785 0.999201 1.187146
HRV 0.575200 0.966985 0.957382 1.178138
HUN 0.848482 0.961687 0.961283 1.044214
IRL 0.732848 1.019702 1.025732 1.702914
ISL 1.000000 1.000000 1.000000 1.000000
ITA 0.791862 0.909093 0.908798 1.038467
LTU 0.735993 1.000000 0.998567 1.169579
LUX 0.202616 0.668706 0.673681 0.994927
LVA 0.797281 1.009955 1.009927 1.275214
MKD 0.790731 0.967919 0.969870 1.195522
MNE 0.803747 1.021389 1.023179 1.424066
NLD 0.755741 1.047540 1.047052 1.568504
NOR 0.920563 1.000020 1.000691 1.999944
POL 0.983407 1.065552 1.069370 1.209978
PRT 0.631692 1.003732 1.003854 1.384633
ROU 0.973586 1.085443 1.086479 1.227511
SRB 0.816148 1.003987 1.019776 1.316581
SVK 0.869593 0.999657 1.000265 1.186504
SVN 0.756566 0.925122 0.925372 1.205018
SWE 0.812975 1.002102 1.000409 1.139647
ingmars commented 3 years ago

Hey Bryn,

note that ENTSO-E has discontinued Power Statistics, so in the future there will only be the TP data. Not sure if that influences your choice for the historic dataset you are using, but I thought maybe it might matter to you.

Cheers Ingmar

brynpickering commented 3 years ago

Thanks @ingmars, I guess this explains why OPSD now relies only on TP data. A pity that older data sources haven't been kept in the new release.

@timtroendle we could use power statistics preferentially up to and including 2019, then rely on transparency platform data after that. For now, it doesn't matter too much since we work with the 2019 OPSD release, which has all the relevant sources.

ingmars commented 3 years ago

Hey Bryn,

yes, that was by mistake / in a hurry, and we plan to release a new version of the timeseries package that again ships with the historic data.

Cheers Ingmar

timtroendle commented 3 years ago

Do we know why entso-e discontinued power statistics data? As they do, shouldn't we rely on TP for future compatibility?

ingmars commented 3 years ago

I think because it caused additional workload for them and because there is no legal requirement to do so - unlike the TP, which is mandated by EU regulation.

timtroendle commented 7 months ago

@brynpickering is this implemented in #66?