Nixtla / mlforecast

Scalable machine 🤖 learning for time series forecasting.
https://nixtlaverse.nixtla.io/mlforecast
Apache License 2.0
789 stars 74 forks source link

MLForecast - Support historic covariates out of the box #328

Closed Vitorbnc closed 3 months ago

Vitorbnc commented 3 months ago

Description

It would be nice to support historic covariates out of the box like in the Darts API or the Neuralforecast API. We could supply them either in the same dataframe (like mlforecast already supports for static covariates) or a separate one. I recently asked about in the Slack channel, and was told we could generate them from lagged version of the future covariates, but it would be nicer an simpler to use if we could just supply them in the training dataframe, and also in the predict method.

Use case

Use case would be predicting future target values while using historic exogenous data. Initial intended models would be lightGBM and XGBoost regressors.

kkckk1110 commented 3 months ago

I have the same question. Can I construct the features before training and hence I can incorporate those features into feature engineering process.

jmoralez commented 3 months ago

In my opinion those APIs are more restrictive, since historic means "use a specific lag of these features" (output_chunk_length in darts and h in neuralforecast).

If you want to turn a feature into historic in mlforecast you can just use a lag that is longer than your forecasting horizon, for example:

import pandas as pd
from mlforecast import MLForecast
from mlforecast.utils import generate_series, generate_prices_for_series
from sklearn.linear_model import LinearRegression

# generate data. the prices span the full range of training dates (so it's a future exog)
series = generate_series(2, equal_ends=True)
prices = generate_prices_for_series(series, horizon=0)

# turn to historic
prices_lag5 = prices.rename(columns={'price': 'price_lag5'})
prices_lag5['ds'] += 5 * pd.offsets.Day()
prices_lag10 = prices.rename(columns={'price': 'price_lag10'})
prices_lag10['ds'] += 10 * pd.offsets.Day()
historic_prices = prices_lag5.merge(prices_lag10, on=['unique_id', 'ds'])

# merge with training set. this drops some rows but if you have more history you wouldn't need to
train = series.merge(historic_prices, on=['unique_id', 'ds'])

# use the regular API for training and forecasting
mlf = MLForecast(
    models=[LinearRegression()],
    freq='D',
)
mlf.fit(train, static_features=[])
mlf.predict(h=5, X_df=historic_prices)

Since h=5 here we can use any lag>=5 as "historic" (5 and 10 in this example).

This doesn't seem that hard. It would be harder to add a new argument to determine which lag to take, a new argument to provide the historic features, etc.

Vitorbnc commented 3 months ago

Thanks @jmoralez! I will close this as the procedure seems simple enough. We may reopen later if needed.