functime-org / functime

Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
https://docs.functime.ai
Apache License 2.0
1.05k stars 58 forks source link

Explain why forecasters drop very short time series #17

Open topher-lo opened 1 year ago

topher-lo commented 1 year ago

Problem

Time series with counts less than the number of lags are silently dropped at predict time. For example, during M5 benchmarking, time series with lengths less than lags=24 are dropped. This is intended behavior, but currently undocumented.

Rationale

functime is made for high-performance ML forecasting in production. Data engineers are responsible for upstream and downstream data quality (including the property of "no missing values"), not ML engineers. I made the explicit design not to include any data quality pre-checks within fit-predict in functime.

Solution

Document why functime has weaker data quality pre-conditions.

Additional comment

My goal is to eventually create a checks module with functions to support more defensive forecasting pipelines. But the choice to have checks will be an explicit pipeline design decision by the user, not the functime forecasting API.

topher-lo commented 1 year ago

Related to #18

baggiponte commented 1 year ago

Do you think we might want to raise a warning when fitting a forecaster where the number of lags is greater than the number of observations? It's a good thing to explain this in the documentation, but I think it'd be more visible and helpful to show at runtime.

topher-lo commented 1 year ago

Absolutely. This was just backlogged for a bit too long.

We have code for it already: https://github.com/neocortexdb/functime/blob/main/tests/test_benchmarks.py (line 43)

It's quite fast. Just need to put it inside the base forecaster's fit function.

And have some global config option to disable checks.