Closed fgenoese closed 5 months ago
Appears to be linked to the data (see raw output below). I was not expecting to see two periods for a single day. But the actual problem seems to be that the first period has 2 points and the second has 22 points, making it 24 in total. But the library correctly expects that Mar 31 should have had 23 hours due to the shift to DST. Will open a ticket on the TP.
<timeseries>
<mrid>1</mrid>
<businesstype>A27</businesstype>
<in_domain.mrid codingscheme="A01">10YIT-GRTN-----B</in_domain.mrid>
<out_domain.mrid codingscheme="A01">10YFR-RTE------C</out_domain.mrid>
<quantity_measure_unit.name>MAW</quantity_measure_unit.name>
<curvetype>A01</curvetype>
<period>
<timeinterval>
<start>2024-03-31T00:00Z</start>
<end>2024-03-31T02:00Z</end>
</timeinterval>
<resolution>PT60M</resolution>
<point>
<position>1</position>
<quantity>2400</quantity>
</point>
<point>
<position>2</position>
<quantity>2400</quantity>
</point>
</period>
<period>
<timeinterval>
<start>2024-03-31T03:00Z</start>
<end>2024-04-01T00:00Z</end>
</timeinterval>
<resolution>PT60M</resolution>
<point>
<position>1</position>
<quantity>2400</quantity>
</point>
<point>
<position>2</position>
<quantity>2400</quantity>
</point>
<point>
<position>3</position>
<quantity>2400</quantity>
</point>
<point>
<position>4</position>
<quantity>2400</quantity>
</point>
<point>
<position>5</position>
<quantity>2400</quantity>
</point>
<point>
<position>6</position>
<quantity>2400</quantity>
</point>
<point>
<position>7</position>
<quantity>2400</quantity>
</point>
<point>
<position>8</position>
<quantity>2400</quantity>
</point>
<point>
<position>9</position>
<quantity>2400</quantity>
</point>
<point>
<position>10</position>
<quantity>2400</quantity>
</point>
<point>
<position>11</position>
<quantity>2400</quantity>
</point>
<point>
<position>12</position>
<quantity>2400</quantity>
</point>
<point>
<position>13</position>
<quantity>2400</quantity>
</point>
<point>
<position>14</position>
<quantity>2400</quantity>
</point>
<point>
<position>15</position>
<quantity>2400</quantity>
</point>
<point>
<position>16</position>
<quantity>2400</quantity>
</point>
<point>
<position>17</position>
<quantity>2400</quantity>
</point>
<point>
<position>18</position>
<quantity>2400</quantity>
</point>
<point>
<position>19</position>
<quantity>2400</quantity>
</point>
<point>
<position>20</position>
<quantity>2400</quantity>
</point>
<point>
<position>21</position>
<quantity>2400</quantity>
</point>
</period>
</timeseries>
I am running into similar issues with the latest version of the library (0.6.7):
start = pd.Timestamp("2023-03-01T00:00:00+02:00")
end = pd.Timestamp("2023-04-02T00:00:00+02:00")
client = EntsoePandasClient(getenv("ENTSOE_API_KEY"))
df_transferred_capacity = client.query_net_transfer_capacity_dayahead(
start=start,
end=end,
country_code_from="NL",
country_code_to="DK_1",
)
The error I'm getting:
ValueError: Length mismatch: Expected axis has 23 elements, new values have 24 elements
@fgenoese did you receive anything back from your ticket?
Seems to be an error on their side; they'll try to fix in the next TP release which is expected in mid June. I'll keep this issue open for now, in case we have to adapt our library here as well.
great thanks!
Looking at the response, it does appear that there is one data point missing from their end (the first period in the time series has values for 00:00 and 01:00, but the second period starts from 03:00, so the 02:00 value is missing):
url = 'https://web-api.tp.entsoe.eu/api'
params = {
"contract_MarketAgreement.Type": "A01",
"documentType": "A61",
"in_Domain": "10YIT-GRTN-----B",
"out_Domain": "10YFR-RTE------C",
"periodEnd": "202403312200",
"periodStart": "202403302300",
"securityToken": ...
}
response = session.get(url=url, params=params)
print(response.text)
# ---------------------------------------------------
<TimeSeries>
...
<Period>
<timeInterval>
<start>2024-03-31T00:00Z</start>
<end>2024-03-31T02:00Z</end>
</timeInterval>
<resolution>PT60M</resolution>
<Point>
<position>1</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>2</position>
<quantity>2400</quantity>
</Point>
</Period>
<Period>
<timeInterval>
<start>2024-03-31T03:00Z</start>
<end>2024-04-01T00:00Z</end>
</timeInterval>
<resolution>PT60M</resolution>
<Point>
<position>1</position>
<quantity>2400</quantity>
</Point>
...
</Period>
</TimeSeries>
Also, looking at the way the time-series objects are structured (made up of potentially multiple periods), this issue could be handled by parsing the individual periods into pandas series, and then concatenating those (instead of directly parsing the time series objects):
# rename the original _parse_crossborder_flows_timeseries() to _parse_crossborder_flows_period()
def _parse_crossborder_flows_period(soup):
"""
Parameters
----------
soup : bs4.element.tag
Returns
-------
pd.Series
"""
positions = []
flows = []
for point in soup.find_all('point'):
positions.append(int(point.find('position').text))
flows.append(float(point.find('quantity').text))
series = pd.Series(index=positions, data=flows)
series = series.sort_index()
series.index = _parse_datetimeindex(soup)#[:len(series)]
return series
# create a new _parse_crossborder_flows_timeseries method (that aggregates individual periods)
def _parse_crossborder_flows_timeseries(soup):
series = [
_parse_crossborder_flows_period(soup_period)
for soup_period in soup.find_all('period')
]
return pd.concat(series)
This approach should work even with missing data (the example stated in this issue), and I believe it should not affect the rest of the data.
Looking at the response, it does appear that there is one data point missing from their end (the first period in the time series has values for 00:00 and 01:00, but the second period starts from 03:00, so the 02:00 value is missing):
Isn't that the expected behaviour for 31st of March 2024? When switching to DST, we skip 1 hour.
Isn't that the expected behaviour for 31st of March 2024? When switching to DST, we skip 1 hour.
It would be in local time, but ENTSO-E reports these values in UTC (where we have no DST / should not have a missing value).
print(pd.Timestamp("2024-03-31T00:00Z").tz_convert("Europe/Berlin"))
# 2024-03-31 01:00:00+01:00
print(pd.Timestamp("2024-04-01T00:00Z").tz_convert("Europe/Berlin"))
# 2024-04-01 02:00:00+02:00
print(pd.Timestamp("2024-04-01T00:00Z") - pd.Timestamp("2024-03-31T00:00Z"))
# 1 days 00:00:00 (24 hours)
A fix was applied by the TP, there is no error anymore on the entsoe-py side. Hence, I will close the issue for now.
This is the raw output after their fix:
<period.timeInterval>
<start>2024-03-30T23:00Z</start>
<end>2024-03-31T22:00Z</end>
</period.timeInterval>
<TimeSeries>
<mRID>1</mRID>
<businessType>A27</businessType>
<in_Domain.mRID codingScheme="A01">10YIT-GRTN-----B</in_Domain.mRID>
<out_Domain.mRID codingScheme="A01">10YFR-RTE------C</out_Domain.mRID>
<quantity_Measure_Unit.name>MAW</quantity_Measure_Unit.name>
<curveType>A01</curveType>
<Period>
<timeInterval>
<start>2024-03-30T23:00Z</start>
<end>2024-03-31T22:00Z</end>
</timeInterval>
<resolution>PT60M</resolution>
<Point>
<position>1</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>2</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>3</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>4</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>5</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>6</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>7</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>8</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>9</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>10</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>11</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>12</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>13</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>14</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>15</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>16</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>17</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>18</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>19</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>20</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>21</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>22</position>
<quantity>2400</quantity>
</Point>
<Point>
<position>23</position>
<quantity>2400</quantity>
</Point>
</Period>
</TimeSeries>
</Publication_MarketDocument>
Good to hear! Thanks for picking this up!
Sent from Proton Mail Android
-------- Original Message -------- On 06/06/2024 12:57, fgenoese wrote:
A fix was applied by the TP, there is no error anymore on the entsoe-py side. Hence, I will close the issue for now.
This is the raw output after their fix:
<
period
.timeInterval> <
start
2024-03-30T23:00Z</
start
<
end
2024-03-31T22:00Z</
end
</
period
.timeInterval> <
TimeSeries
<
mRID
1</
mRID
<
businessType
A27</
businessType
<
in_Domain
.mRID
codingScheme
=
"
A01
"
10YIT-GRTN-----B</
in_Domain
.mRID> <
out_Domain
.mRID
codingScheme
=
"
A01
"
10YFR-RTE------C</
out_Domain
.mRID> <
quantity_Measure_Unit
.name>MAW</
quantity_Measure_Unit
.name> <
curveType
A01</
curveType
<
Period
<
timeInterval
<
start
2024-03-30T23:00Z</
start
<
end
2024-03-31T22:00Z</
end
</
timeInterval
<
resolution
PT60M</
resolution
<
Point
<
position
1</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
2</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
3</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
4</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
5</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
6</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
7</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
8</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
9</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
10</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
11</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
12</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
13</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
14</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
15</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
16</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
17</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
18</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
19</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
20</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
21</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
22</
position
<
quantity
2400</
quantity
</
Point
<
Point
<
position
23</
position
<
quantity
2400</
quantity
</
Point
</
Period
</
TimeSeries
</
Publication_MarketDocument
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>
The following code yields the above-mentioned error (lenght mismatch). It only occurs if start and end are in different time offsets (e.g. due to daylight saving time). Using the latest entsoe-py. Can somebody confirm?