Closed alp-yg closed 12 months ago
@JrtPec could you perhaps provide more info on why this was designed this way?
That sounds like a bug indeed.
I suppose the original design had a simple reason: when combining data of multiple years, an overlap in indices was noticed by someone (me, I suppose), so I added a line that deletes duplicate indices after the merge.
I believe the fix introduces another bug when data is the same for multiple indeces.
I am missing entries with the following code:
start = pd.Timestamp('20220214', tz='Europe/Copenhagen')
end = pd.Timestamp('20220216', tz='Europe/Copenhagen')
country_code = 'DK_1'
df = client.query_day_ahead_prices(country_code, start=start, end=end)
I guess the following could solve both cases:
df = df.loc[~df.duplicated(keep='first') | ~df.index.duplicated(keep='first')]
hi @andreasbrinch I changed some things and no longer missing data in query day ahead prices. Could you confirm this for the latest version that you are no longer missing data?
discussion on duplication issues is being handled in #235
Hi there,
While looking at the data returned by a call to EntsoePandasClient._query_unavailability, I saw that there was data missing from what we can see on the transparency website.
It turns out that the issue comes from the year_limited decorator, and more precisely from this line (l.839 of entsoe.py in the current version):
df = df.loc[~df.index.duplicated(keep='first')]
What is the intended purpose of this line? I'm asking this because at the moment, data returned by the API for doctypes A77/A80 is split by time periods, not by version or something like that. That means that this line only keeps the first period of the entire actual outage, discarding lots of data in the process.
In my opinion, I'd say what needs to be changed is the parsing of outages (function _outage_parser in file parsers.py) to concatenate the different time periods and thus return a dataframe with only one line.
Best, Yannick