RoyalHaskoningDHV / sam

Python package for time series analysis and machine learning
MIT License
26 stars 6 forks source link

Simplify ConstantTimeseriesRegressor #78

Open philiproeleveld opened 1 year ago

philiproeleveld commented 1 year ago

ConstantTemplate (the underlying sklearn estimator for ConstantTimeseriesRegressor) only uses the input data X to determine the output shape of the predictions. It shouldn't actually be necessary for X to even contain data, or be array-like at all, as long as it specifies a length (implements the __len__() dunder).

In #75 and #76 I already loosened the validation on X by allowing NaN/Inf values, but the requirements are still needlessly restrictive because of the assumptions in BaseTimeseriesRegressor. I would like the ability at least to pass an empty dataframe X = pd.DataFrame(index=range(100)).

Moreover, it should be noted that scikit-learn actually has DummyRegressor, implementing the same logic as ConstantTemplate. Although I haven't tested it, ConstantTemplate is probably equivalent to DummyRegressor, and if so can be removed completely in favor of the latter.