RoyalHaskoningDHV / sam

Python package for time series analysis and machine learning
MIT License
25 stars 6 forks source link

Use `use_diff_of_y` and `predict_ahead == [0]` at the same time #33

Open rubenpeters91 opened 2 years ago

rubenpeters91 commented 2 years ago

When using use_diff_of_y you apparently can't set predict_ahead = [0] in TimeseriesMLP, there are multiple checks for this in the code, and removing the first error will lead to predicting a straight line. Using use_diff_of_y with any other predict_ahead works as expected.

philiproeleveld commented 1 year ago

I took some time to investigate this. The reason why you can't set predict_ahead = [0] with use_diff_of_y is because the predict_ahead (or lags in make_shifted_target where differencing is actually calculated) is used to determine the offset for the differencing. So for a predict_ahead of 3 the differencing is calculated using the third next timestamp. Therefore a predict_ahead of zero would result in the "differenced" y being all zero (subtracting the value from the 0th next timestamp; itself).

There is however still a valid reason to want to use differencing with a nowcast (when predict_ahead is zero), because the X features would then be aligned with the nowcast. For example if we have a timeseries with a frequency of 1 minute. Then predict_ahead = [5] would result in a prediction:

Whereas predict_ahead = [0] with a manually configured differencing offset of 5 would result in a prediction:

The last point is the important one: The predict_ahead = [0] case would use data from the last of the two timstamps that are differenced, whereas the predict_ahead = [5] case uses data from the first of the two timestamps.

To support such a "manually configured differencing offest" for the nowcast, since we can't just use the predict_ahead/lags value of zero, you would change the calculation in make_shifted_target[^1] from the current implementation: [^1]: It is also necessary to shift y by the same offset in inverse_differenced_target when adding the differences back.

result = pd.concat([-1 * y.diff(-1 * lag) for lag in lags], axis=1)

To something like:

result = pd.concat([y.diff(nowcast_offset) if lag == 0 else -1 * y.diff(-1 * lag) for lag in lags], axis=1)

Which would at the very least introduce a nowcast_offset parameter in make_shifted_target and probably also in the __init__ of BaseTimeseriesRegressor.

So then, it might be worth it to implement this for some relatively niche application where the desired differencing offset is high and it has significant impact which timestamp is used from X, but I'm not convinced such a case will ever come up. So I think the current behavior of rejecting the combination of use_diff_of_y with predict_ahead = [0] is totally acceptable.