Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
911 stars 90 forks source link

Static Features+Dynamic Features+MLForecast #453

Open saad1912 opened 1 week ago

saad1912 commented 1 week ago

What happened + What you expected to happen

I have a dataset which has 'unique_id','ds','y' columns. There are many 'unique_ids' in the dataset. I also have many other features, some are static and some not static. How to fit MLForecast models on my dataset, so that I am able to incorporate all existing features (static and non-static)?

Versions / Dependencies

import mlforecast print(mlforecast.version) 0.15.0

Reproduction script

features = [col for col in df2.columns if col not in ['unique_id', 'ds', 'y']] models = [lgb.LGBMRegressor(verbosity=-1)] fcst = MLForecast( models=models, lags=range(1, 3), freq='W' ) fcst.fit(df2, static_features=features)

---------------------------Error-------------------------


ValueError Traceback (most recent call last) in <cell line: 7>() 5 freq='W' 6 ) ----> 7 fcst.fit(df2, static_features=features)

3 frames /usr/local/lib/python3.10/dist-packages/mlforecast/core.py in _fit(self, df, id_col, time_col, target_col, static_features, keep_last_n, weight_col) 321 for feat in static_features: 322 if (statics_on_starts[feat] != statics_on_ends[feat]).any(): --> 323 raise ValueError( 324 f"{feat} is declared as a static feature but its values change " 325 "over time. Please set the static_features argument to "

ValueError: row_hash is declared as a static feature but its values change over time. Please set the static_features argument to indicate which features are static. If all of your features are dynamic please set static_features=[].

Issue Severity

High: It blocks me from completing my task.

jmoralez commented 1 week ago

Just do what the error message says. If row_hash is dynamic then don't declare it as static, i.e. remove it from that list. All features in the dataframe are used, the static_features argument is used to distinguish the statics from the dynamics.

saad1912 commented 3 days ago

Just do what the error message says. If row_hash is dynamic then don't declare it as static, i.e. remove it from that list. All features in the dataframe are used, the static_features argument is used to distinguish the statics from the dynamics.

I did the above and was able to train the model, but again I am facing issues during prediction.

lgb_params = { 'verbose': -1,
'num_leaves': 31
} fcst = MLForecast( models=lgb.LGBMRegressor(**lgb_params), freq='W'

) fcst.fit(df, static_features = ['product_hierarchy_01','site']) (MLForecast(models=[LGBMRegressor], freq=W, lag_features=[], date_features=[], num_threads=1) fcst.ts.featuresorder (['product_hierarchy_01', 'site', 'day', 'week', 'month', 'quarter', 'year', 'is_month_start', 'is_month_end', 'is_quarter_start', 'is_quarter_end', 'shipped_qty_lag_week_1', 'shipped_qty_lag_week_2', 'shipped_qty_lag_week_3', 'shipped_qty_lag_week_4', 'shipped_qty_lag_month_1', 'shipped_qty_lag_month_2', 'shipped_qty_lag_month_3', 'shipped_qty_lag_month_6', 'shipped_qty_lag_month_9', 'shipped_qty_lag_month_12', 'shipped_qty_lag_quarter_1', 'shipped_qty_lag_quarter_2', 'shipped_qty_lag_quarter_3', 'shipped_qty_lag_quarter_4', 'shipped_qty_roll_sum_week_1', 'shipped_qty_roll_mean_week_1', 'shipped_qty_roll_median_week_1', 'shipped_qty_roll_stddev_week_1', 'shipped_qty_roll_sum_week_2', 'shipped_qty_roll_mean_week_2', 'shipped_qty_roll_median_week_2', 'shipped_qty_roll_stddev_week_2', 'shipped_qty_roll_sum_week_3', 'shipped_qty_roll_mean_week_3', 'shipped_qty_roll_median_week_3', 'shipped_qty_roll_stddev_week_3', 'shipped_qty_roll_sum_week_4', 'shipped_qty_roll_mean_week_4', 'shipped_qty_roll_median_week_4', 'shipped_qty_roll_stddev_week_4', 'shipped_qty_roll_sum_month_1', 'shipped_qty_roll_mean_month_1', 'shipped_qty_roll_median_month_1', 'shipped_qty_roll_stddev_month_1', 'shipped_qty_roll_sum_month_2', 'shipped_qty_roll_mean_month_2', 'shipped_qty_roll_median_month_2', 'shipped_qty_roll_stddev_month_2', 'shipped_qty_roll_sum_month_3', 'shipped_qty_roll_mean_month_3', 'shipped_qty_roll_median_month_3', 'shipped_qty_roll_stddev_month_3', 'shipped_qty_roll_sum_month_6', 'shipped_qty_roll_mean_month_6', 'shipped_qty_roll_median_month_6', 'shipped_qty_roll_stddev_month_6', 'shipped_qty_roll_sum_month_9', 'shipped_qty_roll_mean_month_9', 'shipped_qty_roll_median_month_9', 'shipped_qty_roll_stddev_month_9', 'shipped_qty_roll_sum_month_12', 'shipped_qty_roll_mean_month_12', 'shipped_qty_roll_median_month_12', 'shipped_qty_roll_stddev_month_12', 'shipped_qty_roll_sum_quarter_1', 'shipped_qty_roll_mean_quarter_1', 'shipped_qty_roll_median_quarter_1', 'shipped_qty_roll_stddev_quarter_1', 'shipped_qty_roll_sum_quarter_2', 'shipped_qty_roll_mean_quarter_2', 'shipped_qty_roll_median_quarter_2', 'shipped_qty_roll_stddev_quarter_2', 'shipped_qty_roll_sum_quarter_3', 'shipped_qty_roll_mean_quarter_3', 'shipped_qty_roll_median_quarter_3', 'shipped_qty_roll_stddev_quarter_3', 'shipped_qty_roll_sum_quarter_4', 'shipped_qty_roll_mean_quarter_4', 'shipped_qty_roll_median_quarter_4', 'shipped_qty_roll_stddev_quarter_4', 'shipped_qty_cumsum_week', 'shipped_qty_cumsum_month', 'shipped_qty_cumsum_quarter', 'shipped_qty_cumsum_year', 'shipped_qty_min', 'shipped_qty_max', 'shipped_qty_mean', 'shipped_qty_median', 'shipped_qty_variance', 'shipped_qty_stddev'])

forecast_horizon = 1

predictions = fcst.predict(h=forecast_horizon)

KeyError Traceback (most recent call last) in <cell line: 3>() 1 forecast_horizon = 1 2 ----> 3 predictions = fcst.predict(h=forecast_horizon)

6 frames /usr/local/lib/python3.10/dist-packages/pandas/core/indexes/base.py in _raise_if_missing(self, key, indexer, axis_name) 6250 6251 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique()) -> 6252 raise KeyError(f"{not_found} not in index") 6253 6254 @overload

KeyError: "['day', 'week', 'month', 'quarter', 'year', 'is_month_start', 'is_month_end', 'is_quarter_start', 'is_quarter_end', 'shipped_qty_lag_week_1', 'shipped_qty_lag_week_2', 'shipped_qty_lag_week_3', 'shipped_qty_lag_week_4', 'shipped_qty_lag_month_1', 'shipped_qty_lag_month_2', 'shipped_qty_lag_month_3', 'shipped_qty_lag_month_6', 'shipped_qty_lag_month_9', 'shipped_qty_lag_month_12', 'shipped_qty_lag_quarter_1', 'shipped_qty_lag_quarter_2', 'shipped_qty_lag_quarter_3', 'shipped_qty_lag_quarter_4', 'shipped_qty_roll_sum_week_1', 'shipped_qty_roll_mean_week_1', 'shipped_qty_roll_median_week_1', 'shipped_qty_roll_stddev_week_1', 'shipped_qty_roll_sum_week_2', 'shipped_qty_roll_mean_week_2', 'shipped_qty_roll_median_week_2', 'shipped_qty_roll_stddev_week_2', 'shipped_qty_roll_sum_week_3', 'shipped_qty_roll_mean_week_3', 'shipped_qty_roll_median_week_3', 'shipped_qty_roll_stddev_week_3', 'shipped_qty_roll_sum_week_4', 'shipped_qty_roll_mean_week_4', 'shipped_qty_roll_median_week_4', 'shipped_qty_roll_stddev_week_4', 'shipped_qty_roll_sum_month_1', 'shipped_qty_roll_mean_month_1', 'shipped_qty_roll_median_month_1', 'shipped_qty_roll_stddev_month_1', 'shipped_qty_roll_sum_month_2', 'shipped_qty_roll_mean_month_2', 'shipped_qty_roll_median_month_2', 'shipped_qty_roll_stddev_month_2', 'shipped_qty_roll_sum_month_3', 'shipped_qty_roll_mean_month_3', 'shipped_qty_roll_median_month_3', 'shipped_qty_roll_stddev_month_3', 'shipped_qty_roll_sum_month_6', 'shipped_qty_ro...

jmoralez commented 2 days ago

Have you read our documentation on exogenous features?