Open topher-lo opened 1 year ago
Related to #18
Do you think we might want to raise a warning when fitting a forecaster where the number of lags is greater than the number of observations? It's a good thing to explain this in the documentation, but I think it'd be more visible and helpful to show at runtime.
Absolutely. This was just backlogged for a bit too long.
We have code for it already: https://github.com/neocortexdb/functime/blob/main/tests/test_benchmarks.py (line 43)
It's quite fast. Just need to put it inside the base forecaster's fit function.
And have some global config option to disable checks.
Problem
Time series with counts less than the number of lags are silently dropped at predict time. For example, during M5 benchmarking, time series with lengths less than
lags=24
are dropped. This is intended behavior, but currently undocumented.Rationale
functime
is made for high-performance ML forecasting in production. Data engineers are responsible for upstream and downstream data quality (including the property of "no missing values"), not ML engineers. I made the explicit design not to include any data quality pre-checks within fit-predict infunctime
.Solution
Document why
functime
has weaker data quality pre-conditions.Additional comment
My goal is to eventually create a
checks
module with functions to support more defensive forecasting pipelines. But the choice to have checks will be an explicit pipeline design decision by the user, not thefunctime
forecasting API.