facebookresearch / Kats

Kats, a kit to analyze time series data, a lightweight, easy-to-use, generalizable, and extendable framework to perform time series analysis, from understanding the key statistics and characteristics, detecting change points and anomalies, to forecasting future trends.
MIT License
4.88k stars 534 forks source link

Can't be Kats Multivariate forecasting only one target(Y) variable? #96

Closed young-hun-jo closed 3 years ago

young-hun-jo commented 3 years ago

Hi. thanks for your awesome project! I am following your tutorial and learning Kats library for multi-variate forecasting. But in the tutorial(201_forecasting.ipynb), I learned multi-variate forecasting(VAR model in tutorial) and I have a question about this. As usual, multi-variate forecasting means to predict only one target value(Y variable) using multi features(X variables). But this tutorial doesn't provide me with the multi-variate forecasting that I want. This tutorial(VAR model) say me that this is multi-variate forecasting but it has two X features and finally predicts two Y target values. How can I predict multi-variate forecasting that I want? Or VAR model can't be used to predict only one Y target value using more than two X features?

스크린샷 2021-07-25 오후 1 24 16
Axemen commented 3 years ago

Hey @young-hun-jo,

The Vector Auto-Regression (VAR) model is designed to take multiple inputs and produce multiple outputs.

It follows the traditional Auto-Regression (AR) model's formula while replacing the individual weights and lags with matrices instead.

You can read more about the math itself here on wikipedia.

As for other models in KATS that can perform multi-variate regression with a singular output, luckily some can. However, it may not be in a format that you are used to (I know it wasn't for me the first time I saw it).

For some models KATS supports you can use the endogenous and exogenous variables.

Where endogenous (endog) is the historical data from the time series you are trying to predict.

And the exogenous (exog) is a second independent variable that has some effect on the endogenous variable.

A better explanation of endog and exog variables

Here's an example of using the ARIMA model with an exogenous variable as well.

df = pd.read_csv("../data/air_passengers.csv", parse_dates=True, infer_datetime_format=True, index_col='ds')

# Setting the exogenous variable to be the month of the year.
exog = df.index.month.values

# Creating the ts 
air_ts = TimeSeriesData(time=df.index, value=df.y)

# Setting the exogneous variable in the model params
params = ARIMAParams(2, 1, 1, exog=exog, freq='MS')

model = ARIMAModel(air_ts, params)
model.fit()

# Note that exogenous variables are required to make the forecast so we have to calculate the exog again for forecasting 
steps = 5
start = exog[-1]

# Generating the next months for the forecast, starting from the last month 
fcst_exog = np.array([(i % 12 + 1) for i in range(start, start+steps)])

model.predict(5, exog=fcst_exog)
young-hun-jo commented 3 years ago

Thanks for explaining this! You provided me the example with one exogenous variable. But my data has two more than exogenous variables. Can I input more exogenous variables into exog argument? If I can do this, how can I set a type of more exogenous variable? (e.g list -> exog=[exog1, exog2, exog3, ... ])

Axemen commented 3 years ago

You absolutely can have multiple variables in your exog argument!

I modified the example to include multiple arguments here

df = pd.read_csv("../data/air_passengers.csv", parse_dates=True, infer_datetime_format=True, index_col='ds')

exog = pd.DataFrame({
    "month": df.index.month,
    "year": df.index.year
}).values

# Creating the ts 
air_ts = TimeSeriesData(time=df.index, value=df.y)

# Setting the exogneous variable in the model params
params = ARIMAParams(2, 1, 1, exog=exog, freq='MS')

model = ARIMAModel(air_ts, params)
model.fit()

# Note that exogenous variables are required to make the forecast so we have to calculate the exog again for the forecasting steps
steps = 5
start = exog[-1]

fcst_exog = pd.DataFrame({
    "month": [(i % 12 + 1) for i in range(start[0], start[0]+steps)],
    "year": list(range(start[1], start[1]+steps))
}).values

model.predict(5, exog=fcst_exog)
Axemen commented 3 years ago

The key is to make sure that the exog variable is a numpy array with the shape (length_of_data, number_of_variables).

In this example the length of the training data is 144 and the number of exog variables is 2 so the shape for exog needs to be (144, 2)

young-hun-jo commented 3 years ago

Thanks for your awesome explanation and example! I could do this due to you! I will close this issue 😆