'year_limited' is skipping (sometimes?) the first row of the frames leanding to missing row in final results

EnergieID / entsoe-py

Python client for the ENTSO-E API (european network of transmission system operators for electricity)

MIT License

430 stars 189 forks source link

import pandas from entsoe import EntsoePandasClient client = EntsoePandasClient(api_key="397870bf-afdc-4422-ba9e-2d8ef803fa2a") # API key from Sebastien de Menten (GFJ138), use with care client.session.verify = False df = client.query_day_ahead_prices("FR", start=pandas.Timestamp("2022", tz="CET"), end=pandas.Timestamp("2023/02", tz="CET")) print(df["2022-12-30T22":"2022-12-31T02"])

To clarify, the year_limited decorator tries to enforce the _start and _end timestamps to the result frame. However other side effects can happen:

To reproduce the bug

import pandas
from entsoe import EntsoePandasClient

client = EntsoePandasClient(api_key="397870bf-afdc-4422-ba9e-2d8ef803fa2a")  # API key from Sebastien de Menten (GFJ138), use with care
client.session.verify = False
df = client..query_installed_generation_capacity(
    "FR",
    start=pd.Timestamp("2017-01-01", tz="Europe/Paris"),
    end=pd.Timestamp("2023-01-01", tz="Europe/Paris"),
)
print(df.index)

outputs

DatetimeIndex(['2017-01-01 00:00:00+01:00'], dtype='datetime64[ns, Europe/Paris]', freq=None)

This is due to the fact that for the first block, the api returns a single row of date '2017-01-01 00:00:00+01:00' and the decorator doesn't filter out this value since it's the first frame. But for the following ones the api returns the first timestamp of each year ('2018-01-01 00:00:00+01:00', '2019-01-01 00:00:00+01:00', …). But those values are filtered out by the condition frame.index > _start.

You're solution handles that well too.

EnergieID / entsoe-py

'year_limited' is skipping (sometimes?) the first row of the frames leanding to missing row in final results #363