EnergieID / entsoe-py

Python client for the ENTSO-E API (european network of transmission system operators for electricity)
MIT License
426 stars 187 forks source link

Length mismatch: Expected axis has 23 elements, new values have 24 elements [related to DST switch?] #331

Closed fgenoese closed 5 months ago

fgenoese commented 5 months ago

The following code yields the above-mentioned error (lenght mismatch). It only occurs if start and end are in different time offsets (e.g. due to daylight saving time). Using the latest entsoe-py. Can somebody confirm?

import pandas as pd
import entsoe

client = entsoe.EntsoePandasClient(api_key='XXXXXXXXXXXXXXXXXXXX')
start=pd.Timestamp('20240331', tz='Europe/Berlin')
end=pd.Timestamp('20240401', tz='Europe/Berlin')
neighbour = 'FR'
origin = 'IT'
df = client.query_net_transfer_capacity_dayahead(neighbour, origin, start=start, end=end)
fgenoese commented 5 months ago

Appears to be linked to the data (see raw output below). I was not expecting to see two periods for a single day. But the actual problem seems to be that the first period has 2 points and the second has 22 points, making it 24 in total. But the library correctly expects that Mar 31 should have had 23 hours due to the shift to DST. Will open a ticket on the TP.

<timeseries>
   <mrid>1</mrid>
   <businesstype>A27</businesstype>
   <in_domain.mrid codingscheme="A01">10YIT-GRTN-----B</in_domain.mrid>
   <out_domain.mrid codingscheme="A01">10YFR-RTE------C</out_domain.mrid>
   <quantity_measure_unit.name>MAW</quantity_measure_unit.name>
   <curvetype>A01</curvetype>
   <period>
      <timeinterval>
         <start>2024-03-31T00:00Z</start>
         <end>2024-03-31T02:00Z</end>
      </timeinterval>
      <resolution>PT60M</resolution>
      <point>
         <position>1</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>2</position>
         <quantity>2400</quantity>
      </point>
   </period>
   <period>
      <timeinterval>
         <start>2024-03-31T03:00Z</start>
         <end>2024-04-01T00:00Z</end>
      </timeinterval>
      <resolution>PT60M</resolution>
      <point>
         <position>1</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>2</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>3</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>4</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>5</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>6</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>7</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>8</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>9</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>10</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>11</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>12</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>13</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>14</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>15</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>16</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>17</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>18</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>19</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>20</position>
         <quantity>2400</quantity>
      </point>
      <point>
         <position>21</position>
         <quantity>2400</quantity>
      </point>
   </period>
</timeseries>
milannnnn commented 5 months ago

I am running into similar issues with the latest version of the library (0.6.7):

start = pd.Timestamp("2023-03-01T00:00:00+02:00")
end =   pd.Timestamp("2023-04-02T00:00:00+02:00")
client = EntsoePandasClient(getenv("ENTSOE_API_KEY"))
df_transferred_capacity = client.query_net_transfer_capacity_dayahead(
    start=start,
    end=end,
    country_code_from="NL",
    country_code_to="DK_1",
)

The error I'm getting:

ValueError: Length mismatch: Expected axis has 23 elements, new values have 24 elements
fboerman commented 5 months ago

@fgenoese did you receive anything back from your ticket?

fgenoese commented 5 months ago

Seems to be an error on their side; they'll try to fix in the next TP release which is expected in mid June. I'll keep this issue open for now, in case we have to adapt our library here as well.

fboerman commented 5 months ago

great thanks!

milannnnn commented 5 months ago

Looking at the response, it does appear that there is one data point missing from their end (the first period in the time series has values for 00:00 and 01:00, but the second period starts from 03:00, so the 02:00 value is missing):

url = 'https://web-api.tp.entsoe.eu/api'
params = {
   "contract_MarketAgreement.Type": "A01",
   "documentType": "A61",
   "in_Domain": "10YIT-GRTN-----B",
   "out_Domain": "10YFR-RTE------C",
   "periodEnd": "202403312200",
   "periodStart": "202403302300",
   "securityToken": ...
}
response = session.get(url=url, params=params)

print(response.text)
# ---------------------------------------------------
<TimeSeries>
    ...
        <Period>
            <timeInterval>
                <start>2024-03-31T00:00Z</start>
                <end>2024-03-31T02:00Z</end>
            </timeInterval>
            <resolution>PT60M</resolution>
                <Point>
                    <position>1</position>
                    <quantity>2400</quantity>
                </Point>
                <Point>
                    <position>2</position>
                    <quantity>2400</quantity>
                </Point>
        </Period>
        <Period>
            <timeInterval>
                <start>2024-03-31T03:00Z</start>
                <end>2024-04-01T00:00Z</end>
            </timeInterval>
            <resolution>PT60M</resolution>
                <Point>
                    <position>1</position>
                    <quantity>2400</quantity>
                </Point>
                ...
        </Period>
</TimeSeries>

Also, looking at the way the time-series objects are structured (made up of potentially multiple periods), this issue could be handled by parsing the individual periods into pandas series, and then concatenating those (instead of directly parsing the time series objects):

# rename the original _parse_crossborder_flows_timeseries() to _parse_crossborder_flows_period()
def _parse_crossborder_flows_period(soup):
    """
    Parameters
    ----------
    soup : bs4.element.tag

    Returns
    -------
    pd.Series
    """
    positions = []
    flows = []
    for point in soup.find_all('point'):
        positions.append(int(point.find('position').text))
        flows.append(float(point.find('quantity').text))

    series = pd.Series(index=positions, data=flows)
    series = series.sort_index()
    series.index = _parse_datetimeindex(soup)#[:len(series)]
    return series

# create a new _parse_crossborder_flows_timeseries method (that aggregates individual periods)
def _parse_crossborder_flows_timeseries(soup):
    series = [
        _parse_crossborder_flows_period(soup_period)
        for soup_period in soup.find_all('period')
    ]
    return pd.concat(series)

This approach should work even with missing data (the example stated in this issue), and I believe it should not affect the rest of the data.

fgenoese commented 5 months ago

Looking at the response, it does appear that there is one data point missing from their end (the first period in the time series has values for 00:00 and 01:00, but the second period starts from 03:00, so the 02:00 value is missing):

Isn't that the expected behaviour for 31st of March 2024? When switching to DST, we skip 1 hour.

milannnnn commented 5 months ago

Isn't that the expected behaviour for 31st of March 2024? When switching to DST, we skip 1 hour.

It would be in local time, but ENTSO-E reports these values in UTC (where we have no DST / should not have a missing value).

print(pd.Timestamp("2024-03-31T00:00Z").tz_convert("Europe/Berlin"))
# 2024-03-31 01:00:00+01:00
print(pd.Timestamp("2024-04-01T00:00Z").tz_convert("Europe/Berlin"))
# 2024-04-01 02:00:00+02:00
print(pd.Timestamp("2024-04-01T00:00Z") - pd.Timestamp("2024-03-31T00:00Z"))
# 1 days 00:00:00 (24 hours)
fgenoese commented 5 months ago

A fix was applied by the TP, there is no error anymore on the entsoe-py side. Hence, I will close the issue for now.

This is the raw output after their fix:

    <period.timeInterval>
        <start>2024-03-30T23:00Z</start>
        <end>2024-03-31T22:00Z</end>
    </period.timeInterval>
    <TimeSeries>
        <mRID>1</mRID>
        <businessType>A27</businessType>
        <in_Domain.mRID codingScheme="A01">10YIT-GRTN-----B</in_Domain.mRID>
        <out_Domain.mRID codingScheme="A01">10YFR-RTE------C</out_Domain.mRID>
        <quantity_Measure_Unit.name>MAW</quantity_Measure_Unit.name>
        <curveType>A01</curveType>
        <Period>
            <timeInterval>
                <start>2024-03-30T23:00Z</start>
                <end>2024-03-31T22:00Z</end>
            </timeInterval>
            <resolution>PT60M</resolution>
            <Point>
                <position>1</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>2</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>3</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>4</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>5</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>6</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>7</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>8</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>9</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>10</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>11</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>12</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>13</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>14</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>15</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>16</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>17</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>18</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>19</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>20</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>21</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>22</position>
                <quantity>2400</quantity>
            </Point>
            <Point>
                <position>23</position>
                <quantity>2400</quantity>
            </Point>
        </Period>
    </TimeSeries>
</Publication_MarketDocument>
fboerman commented 5 months ago

Good to hear! Thanks for picking this up!

Sent from Proton Mail Android

-------- Original Message -------- On 06/06/2024 12:57, fgenoese wrote:

A fix was applied by the TP, there is no error anymore on the entsoe-py side. Hence, I will close the issue for now.

This is the raw output after their fix:

<

period

.timeInterval> <

start

2024-03-30T23:00Z</

start

<

end

2024-03-31T22:00Z</

end

</

period

.timeInterval> <

TimeSeries

<

mRID

1</

mRID

<

businessType

A27</

businessType

<

in_Domain

.mRID

codingScheme

=

"

A01

"

10YIT-GRTN-----B</

in_Domain

.mRID> <

out_Domain

.mRID

codingScheme

=

"

A01

"

10YFR-RTE------C</

out_Domain

.mRID> <

quantity_Measure_Unit

.name>MAW</

quantity_Measure_Unit

.name> <

curveType

A01</

curveType

<

Period

<

timeInterval

<

start

2024-03-30T23:00Z</

start

<

end

2024-03-31T22:00Z</

end

</

timeInterval

<

resolution

PT60M</

resolution

<

Point

<

position

1</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

2</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

3</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

4</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

5</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

6</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

7</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

8</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

9</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

10</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

11</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

12</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

13</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

14</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

15</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

16</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

17</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

18</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

19</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

20</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

21</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

22</

position

<

quantity

2400</

quantity

</

Point

<

Point

<

position

23</

position

<

quantity

2400</

quantity

</

Point

</

Period

</

TimeSeries

</

Publication_MarketDocument

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>