Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
884 stars 87 forks source link

ValueError when trying to use prediction_intervals on multivariate forecasting task #185

Closed m-janyell0w closed 1 year ago

m-janyell0w commented 1 year ago

Hello team,

I really appreciate the work that you do, I wish I would have found this library earlier :-D. Anyways, I am following this tutorial to try and create forecasts for a multivariate forecasting task. My dataset consists of the date column 'ds', 'unique_id', the target 'y', some lags 'y_lag_N' and , multiple regressor column of type float. I can not share the data due to confidentiality, but here is what my code looks like (where dataset_train is the before described training dataframe):

from mlforecast import MLForecast
from mlforecast.target_transforms import Differences
from mlforecast.utils import PredictionIntervals
from sklearn.linear_model import Lasso, LinearRegression, Ridge
from sklearn.neighbors import KNeighborsRegressor
from sklearn.neural_network import MLPRegressor

models = [
    KNeighborsRegressor(),
    Lasso(),
    LinearRegression(),
    MLPRegressor(),
    Ridge(),
]

mlf = MLForecast(
    models=models,
    target_transforms=[Differences([1])],
)

H = 12

# convert ds to int
dataset_train['ds'] = dataset_train['ds'].apply(lambda x: int(x.timestamp()))

mlf.fit(
    dataset_train, 
    id_col='unique_id', 
    time_col='ds', 
    target_col='y', 
    prediction_intervals=PredictionIntervals(
        n_windows=len(dataset_train)//H, window_size=H),

)

levels = [50, 80, 95]
forecasts = mlf.predict(H, level=levels)
forecasts.head()

This results in:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
[~\AppData\Local\Temp\ipykernel_20272\4290687288.py](https://file+.vscode-resource.vscode-cdn.net/c%3A/Users/user/Documents/projectname/notebooks/~/AppData/Local/Temp/ipykernel_20272/4290687288.py) in ()
     27 # dataset_train['ds'] = dataset_train['ds'].apply(lambda x: int(x.timestamp()))
     28 
---> 29 mlf.fit(
     30     dataset_train_2,
     31     id_col='unique_id',

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\utils.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/utils.py) in inner(*args, **kwargs)
    184                             new_args.append(kwargs.pop(arg_names[i]))
    185                         new_args.append(kwargs.pop(old_name))
--> 186             return f(*new_args, **kwargs)
    187 
    188         return inner

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\forecast.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/forecast.py) in fit(self, df, id_col, time_col, target_col, static_features, dropna, keep_last_n, max_horizon, prediction_intervals, data)
    374         if prediction_intervals is not None:
    375             self.prediction_intervals = prediction_intervals
--> 376             self._cs_df = self._conformity_scores(
    377                 df=df,
    378                 id_col=id_col,

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\forecast.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/forecast.py) in _conformity_scores(self, df, id_col, time_col, target_col, static_features, dropna, keep_last_n, max_horizon, n_windows, h)
    306         is the same for all the forecasting horizon (`h=1`).
    307         """
--> 308         cv_results = self.cross_validation(
    309             df=df,
    310             n_windows=n_windows,

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\utils.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/utils.py) in inner(*args, **kwargs)
    184                             new_args.append(kwargs.pop(arg_names[i]))
    185                         new_args.append(kwargs.pop(old_name))
--> 186             return f(*new_args, **kwargs)
    187 
    188         return inner

[c:\Users\user\Anaconda3\envs\endash\lib\site-packages\mlforecast\forecast.py](file:///C:/Users/user/Anaconda3/envs/endash/lib/site-packages/mlforecast/forecast.py) in cross_validation(self, df, n_windows, h, id_col, time_col, target_col, step_size, static_features, dropna, keep_last_n, refit, max_horizon, before_predict_callback, after_predict_callback, prediction_intervals, level, input_size, fitted, data, window_size)
    696             )
    697             if result.shape[0] < valid.shape[0]:
--> 698                 raise ValueError(
    699                     "Cross validation result produced less results than expected. "
    700                     "Please verify that the frequency set on the MLForecast constructor matches your series' "

ValueError: Cross validation result produced less results than expected. Please verify that the frequency set on the MLForecast constructor matches your series' and that there aren't any missing periods.

Btw, converting the datestamp column to integer was needed or else I got the error: "TypeError: Addition/subtraction of integers and integer-arrays with DatetimeArray is no longer supported. Instead of adding/subtracting n, use n * obj.freq"

I am working in an anaconda environment using Python 3.9.16 on Windows 10. I use following packages mlforecast 0.9.0 statsforecast 1.5.0 pandas 2.0.1 scikit-learn 1.2.2

Any advice on what could be the problem here is highly appreciated!

Cheers, Micha

jmoralez commented 1 year ago

Hey @m-janyell0w, thanks for the detailed report.

With respect to the error of the time column, you need to set the frequency in the constructor. For example:

mlf = MLForecast(
    freq='D', # this would mean daily frequency
    models=models,
    target_transforms=[Differences([1])],
)

About the CV error, you probably have gaps in your series. For example, suppose you have daily series and for one of your series the data jumps from 2023-07-09 to 2023-07-11 (it skips 2023-07-10). In this case when building the CV results it will fail because one period will be missing from the actuals.

github-actions[bot] commented 1 year ago

This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one.