Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
841 stars 80 forks source link

[Forecast] Forecast Fitted Values Failing with NA #233

Closed NudnikShpilkis closed 10 months ago

NudnikShpilkis commented 11 months ago

What happened + What you expected to happen

As per the conversation with @jmoralez there's a bug in forecast_fitted_values. If MLForecast is fit with dropna=True, we'll get the error Found different number of groups in fitted differences because groups have been dropped. If we set dropna=False then we'll be unable to fit since many models can't handle NA values. Below is reproducible code demonstrating the issue.

Versions / Dependencies

mlforecast = 0.9.3

Reproduction script

Script ```python import numpy as np import pandas as pd from sklearn.preprocessing import FunctionTransformer from sklearn.linear_model import LinearRegression from mlforecast import MLForecast from mlforecast.utils import generate_daily_series from mlforecast.target_transforms import Differences, LocalStandardScaler from mlforecast.target_transforms import GlobalSklearnTransformer from window_ops.rolling import rolling_mean horizon = 10 # Create a basic series df = generate_daily_series( n_series=10, min_length=10, max_length=25, equal_ends=True, ) df.groupby(['unique_id'])['ds'].agg(['min', 'max']) test_end = df['ds'].max() train_end = test_end - pd.Timedelta(days=10) train_df = ( df .query("ds <= @train_end") ) # fcst = MLForecast( models={'lr': LinearRegression()}, freq='D', lags=range(1, 7), lag_transforms={ i: [(rolling_mean, 3), (rolling_mean, 4)] for i in range(1, 7) }, target_transforms=[Differences([1])], date_features=['day', 'dayofweek', 'week', 'month', 'quarter', 'year'], ) # THIS WILL FAIL WHEN FITTING THE FORECASTED VALUE fcst.fit( df, fitted=True, dropna=True, ) # THIS WILL FAIL WHEN TRAINING fcst.fit( df, fitted=True, dropna=False, ) ```

Issue Severity

High: It blocks me from completing my task.