Closed Impelon closed 2 months ago
For the examples you provided, there actually are no wrong results (except for the fact that the error message are bad/confusing)
s.forecast(start, start + datetime.timedelta(minutes=70), column="Berlin")
works as described in the docstring: "If frequency is not specified, all existing data in the window will be returned."s.forecast(start, start + datetime.timedelta(minutes=70), frequency="10min", column="Berlin")
should result in an error because there is no available data at start + 70 minutes (only 60min forecast)s.forecast(start, start + datetime.timedelta(minutes=70), frequency="10min", resample_method="bfill", column="Berlin")
should also result in an error because the missing data point at start + 70min can also not be determined using back-fill, as there are no data points after start + 70minThere is, however, unexpected behavior that becomes visible, when looking at a function call like:
s.forecast(start, start + datetime.timedelta(minutes=70), frequency="10min", resample_method="ffill", column="Berlin")
This should not result in an error, as the missing point can be forward-filled using available data. Here, the bad handling of the IndexError becomes a problem.
I will try to fix this in a few days. Thank you for notifying :)
s.forecast(start, start + datetime.timedelta(minutes=70), column="Berlin")
works as described in the docstring: "If frequency is not specified, all existing data in the window will be returned."
Agreed.
s.forecast(start, start + datetime.timedelta(minutes=70), frequency="10min", column="Berlin")
should result in an error because there is no available data at start + 70 minutes (only 60min forecast)
As a user I'd find it a bit confusing that I would get an error when I specify frequency
, but it runs fine if I don't.
At the very least the docstring is not clear on this matter.
Other than the missing documentation regarding the inconsistency with the normal behaviour, I think it would be fine to return a custom error as well, yes.
So if I read correctly that your preferred solution would be to raise a custom Exception in the cases that IndexError is raised now (except when resample_method="ffil"
). As I said, I think that is an ok solution, too, but then in my opinion you should also add a way to programmatically get the length of the available forecast.
Either through a new method that can be called with (start_time, end_time)
before making a call to forecast
, or with a custom Exception class that includes a max_end_time
attribute or similar, so that if a call to forecast
fails, it can be repeated with the correct end_time
.
Otherwise, it would be impossible to guarantee correct usage of HistoricalSignal.forecast
in code, having no access to the underlying data. (I.e. if I built a component on top of a user-specified HistoricalSignal
, I won't be able to tell what end_time
is safe so that forecast
does not fail.)
I will try to fix this in a few days.
Thank you :) Just as a side note: This issue is not urgent for me. I'm still using vessim 0.6.0, so I've built a workaround for this anyway.
I'm back again with some more (from a user perspective) unexpected behaviour :relaxed:
Take any
HistoricalSignal
that includes separate forecast information, e.g. one of the solcast datasets.The
IndexError
thus happens if a forecast of length longer than the one available is requested andfrequency
is specified. (Ok, the frequency also needs to be such that the resampled forecast would exceed the original in length - i.e. a frequency of 1h would work in the above, because it will just output one data point for the next hour and skip the next. But those frequency values that work are rather uncommon.) This can probably also happen without separate forecast data, just using the actual signal. It is just easier to trigger it here, as the length of the forecasts available is usually more limited.If one looks into the
new_times_indices
, one notices that this is because the highest indices will be off by 1, becausenp.searchsorted
rightly says the elements would be at an index later than the highest element available. For exampleOne could easily mask the confusing
IndexError
, for instance by dropping any indices that are above the length of thetimes
array (here: 13).