EnergieID / entsoe-py

Python client for the ENTSO-E API (european network of transmission system operators for electricity)
MIT License
413 stars 183 forks source link

inconsistent format of results of EntsoePandasClient.query_generation() #328

Open mkaut opened 4 months ago

mkaut commented 4 months ago

I am testing the EntsoePandasClient and getting inconsistent formatting of results of query_generation(), in several ways. In all cases, I am asking for data from 2019 to 2023, i.e., I am calling client.query_generation(country_code, start=pd.Timestamp('20190101', tz='Europe/Brussels'), end=pd.Timestamp('20240101', tz='Europe/Brussels')).

For Germany (country_code = 'DE_LU'), the result has a multi-indexed columns:

                                    Biomass Fossil Brown coal/Lignite Fossil Coal-derived gas        Fossil Gas                     ...              Solar             Waste     Wind Offshore      Wind Onshore
                          Actual Aggregated         Actual Aggregated       Actual Aggregated Actual Aggregated Actual Consumption  ... Actual Consumption Actual Aggregated Actual Aggregated Actual Aggregated Actual Consumption
2019-01-01 00:00:00+01:00            4812.0                    6932.0                   273.0            3410.0                1.0  ...                NaN             783.0            3177.0           19366.0                NaN
2019-01-01 00:15:00+01:00            4828.0                    6351.0                   481.0            3295.0                1.0  ...                NaN             772.0            3174.0           20132.0                NaN
2019-01-01 00:30:00+01:00            4834.0                    6221.0                   481.0            3228.0                1.0  ...                NaN             779.0            3167.0           20863.0                NaN

The same query for Denmark's DK-1 zone (country_code='DK_1') returns single-indexed columns:

                           Biomass  (Biomass, Actual Aggregated)  (Biomass, Actual Consumption)  ...  (Wind Offshore, Actual Consumption)  (Wind Onshore, Actual Aggregated)  (Wind Onshore, Actual Consumption)
2019-01-01 00:00:00+01:00      NaN                          79.0                            NaN  ...                                  NaN                             2330.0                                 NaN
2019-01-01 01:00:00+01:00      NaN                          62.0                            NaN  ...                                  NaN                             2427.0                                 NaN
2019-01-01 03:00:00+01:00      NaN                          62.0                            NaN  ...                                  NaN                             2290.0                                 NaN
2019-01-01 04:00:00+01:00      NaN                          58.0                            NaN  ...                                  NaN                             2229.0                                 NaN

Note that the column names are tuples, but the .columns is still an Index, not MultiIndex like for Germany.

What's even worse, the column assignment changes when I use the psr_type argument in the call. To illustrate this, consider all columns for offshore wind from the previous dataframe:

                           Wind Offshore  (Wind Offshore, Actual Aggregated)  (Wind Offshore, Actual Consumption)
2019-01-01 00:00:00+01:00            NaN                               638.0                                  NaN
2019-01-01 01:00:00+01:00            NaN                               686.0                                  NaN
2019-01-01 02:00:00+01:00            NaN                               296.0                                  NaN
2019-01-01 03:00:00+01:00            NaN                               289.0                                  NaN
2019-01-01 04:00:00+01:00            NaN                               283.0                                  NaN
...                                  ...                                 ...                                  ...
2023-12-31 19:00:00+01:00         1129.0                                 NaN                                  NaN
2023-12-31 20:00:00+01:00         1093.0                                 NaN                                  NaN
2023-12-31 21:00:00+01:00         1165.0                                 NaN                                  NaN
2023-12-31 22:00:00+01:00         1191.0                                 NaN                                  NaN
2023-12-31 23:00:00+01:00         1163.0                                 NaN                                  NaN

There, we can see that the values actually switch columns somewhere during the period. EDIT: It turns out the data switch column several times: they are in (Wind Offshore, Actual Aggregated) in 2019 and 2021 and in Wind Offshore in 2020, 2022, and 2023. Also note the inconsistency in naming, with the first column having name as string, while the other two as a tuple.

On the other hand, asking only for offshore wind with psr_type='B18' returns

                           (Wind Offshore, Actual Aggregated)  (Wind Offshore, Actual Consumption)  Wind Offshore
2019-01-01 00:00:00+01:00                                 NaN                                  NaN          638.0
2019-01-01 01:00:00+01:00                                 NaN                                  NaN          686.0
2019-01-01 02:00:00+01:00                                 NaN                                  NaN          296.0
2019-01-01 03:00:00+01:00                                 NaN                                  NaN          289.0
2019-01-01 04:00:00+01:00                                 NaN                                  NaN          283.0
...                                                       ...                                  ...            ...
2023-12-31 19:00:00+01:00                                 NaN                                  NaN         1129.0
2023-12-31 20:00:00+01:00                                 NaN                                  NaN         1093.0
2023-12-31 21:00:00+01:00                                 NaN                                  NaN         1165.0
2023-12-31 22:00:00+01:00                                 NaN                                  NaN         1191.0
2023-12-31 23:00:00+01:00                                 NaN                                  NaN         1163.0

i.e., the values are in the Wind Offshore column in all years. EDIT: The values turned out to be in column (Wind Offshore, Actual Aggregated) in 2021.

In other words, values one gets with the psr_type argument are not a subset of values without, as I would expect.

fboerman commented 4 months ago

Please provide some examples with code snippets and clear explanation

-------- Original Message -------- On 2 May 2024, 09:59, Michal Kaut wrote:

I am testing the EntsoePandasClient and getting inconsistent formatting of results, in several ways

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>