bashtage / arch

ARCH models in Python
Other
1.32k stars 245 forks source link

Rolling window forecast with rolling demean #633

Open gavincyi opened 1 year ago

gavincyi commented 1 year ago

I saw in the documentation that rolling window forecast can be applied with parameter first_obs and last_obs, while I am looking for an approach with minimal runtime overhead to

  1. demean the series in rolling basis, i.e. returns[first_obs:last_obs] - mean(returns[first_obs:last_obs]), and
  2. fit the GARCH model

I wonder if the constant mean is applied on the rolling basis, or actually on the whole timeseries of argument y.

model = arch_model(tseries, vol="GARCH", mean="Constant", ...)
for i in range(len(tseries) - rolling_window):
  model.fit(first_obs=i, last_ob=i + rolling_window - 1, ...)

I tried to look into the source code but could not conclude it in a glance. Could you help address it?

bashtage commented 1 year ago

The mean is jointly estimated with the variance parameters. If you want the exact in-sample mean, you would need to first demean the data using the rolling mean, and then fit a model with ZeroMean. This would involve recreating the model for each sample.

If you use the fit with first and last, then it will jointly estimate everything.

The fastest way is to use the previous fit values for starting values. Here is a demo:

import arch
from arch.data import sp500
import datetime as dt

r = 100 * sp500.load().iloc[:, -2].pct_change().dropna()

last_obs = 1100
now = dt.datetime.now()
for i in range(1000, last_obs):
    res = arch.arch_model(r.iloc[i - 1000 : i]).fit(disp="off")
print(f"{(dt.datetime.now() - now).total_seconds()} (new model, no starting values)")

now = dt.datetime.now()
for i in range(1000, last_obs):
    arch.arch_model(r).fit(disp="off", first_obs=i - 1000, last_obs=i)
print(f"{(dt.datetime.now() - now).total_seconds()} (no starting values)")

last = None
now = dt.datetime.now()
for i in range(1000, last_obs):
    res = arch.arch_model(r.iloc[i - 1000 : i]).fit(disp="off", starting_values=last)
    last = res.params
print(f"{(dt.datetime.now() - now).total_seconds()} (starting values)")

On my machine I see

1.941473 (new model, no starting values)
1.928971 (no starting values)
1.236577 (starting values)

One final option is to only occasionally update the parameters. This updates parameters every 10 observations. Otherwise it uses the last values.

last = None
now = dt.datetime.now()
for i in range(1000, last_obs):
    mod = arch.arch_model(r.iloc[i - 1000 : i])
    if i % 10 == 0 or last is None:
        res = mod.fit(disp="off", starting_values=last)
        last = res.params
    mod.forecast(res.params, horizon=1)
print(
    f"{(dt.datetime.now() - now).total_seconds()} (starting values, occasionally update)"
)
0.224142 (starting values, occasionally update)
bashtage commented 1 year ago

One final answer -- when using first_obs and last_obs, the parameters are estimated only using the selected sample.

gavincyi commented 1 year ago

Thanks for your prompt response. All the above makes perfect sense to me.

I wonder if you think adding an argument to allow demean in the rolling basis is a good idea, i.e. fit(demean=True, ...) passes the demean samples before fitting in the model. If so, I can create a PR for it.