Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
841 stars 80 forks source link

[Core] keep_last_n has no effect in MLForecast.preprocess #227

Closed shchur closed 11 months ago

shchur commented 11 months ago

What happened + What you expected to happen

  1. Passing the keep_last_n argument to MLForecast.preprocess method has no effect.
  2. I would expect that only keep_last_n many rows are kept for each time series in the dataset. However, the output of the preprocess method is identical, regardless of whether I pass the keep_last_n argument.

Versions / Dependencies

Reproduction script

import pandas as pd
from mlforecast import MLForecast

N = 1000
df = pd.DataFrame(
    {
        "unique_id": [0] * N,
        "ds": list(range(N)),
        "y": [1] * N,
    }
)

mlf = MLForecast(models=[], lags=list(range(10)))
output = mlf.preprocess(df, keep_last_n=10)
print(output.shape)  # shape is (991, 13); expected shape (10, 13)

Issue Severity

Medium: It is a significant difficulty but I can work around it.

jmoralez commented 11 months ago

Hey @shchur, thanks for using mlforecast. The keep_last_n argument is for the predict step and it's meant to make it faster by recomputing the features only on the last n observations of each serie. So in your example since the max lag you're using is 9, you only need 9 values of each serie to perform the updates (by default mlforecast keeps all history in order to accurately compute expanding statistics).

Please let us know if you have further doubts.

shchur commented 11 months ago

Thanks a lot for the quick response @jmoralez and also for maintaining this amazing library!

It seems that I misunderstood the purpose of this argument, closing the issue.