bashtage / arch

ARCH models in Python
Other
1.32k stars 245 forks source link

Apply model to new data? #605

Open msquaredds opened 2 years ago

msquaredds commented 2 years ago

I was wondering if there's a way to create a model and then apply it to new data.

What I'm trying to do is: bootstrap data, create/fit model for a bootstrap, apply that model to original data. At first I thought maybe the new data could be passed in to "forecast" as it is in some other python packages, but I didn't see anything in the docs for that.

I did see that params can be passed in and so I'm wondering if that's the correct approach. How I was visualizing that is: bootstrap data, create/fit model, create/fit a new model and then forecast with that model but pass the original params. This seems a little convoluted since I'm not sure why I'd need to fit the new model conceptually, but since .forecast is a method on .fit and it seems necessary there.

I tried to build up to seeing if the params approach would work, but I'm running into an issue. I'll share the general idea/code here to see if that sheds light on anything obvious, but if not I can create a full example too. Basically I was creating/fitting a model and then forecasting with it (not on other data, just on the same data), which gave me a set of parameters and forecasts. I then re-attempted the forecast and set the params to be the same as the original model, but was getting different forecast values (everything else was kept the same).

Below I'm getting different values for self.fcst and test_fcst:

   # Run the model
    garch_model = arch_model(self.depen_data, x=self.indep_data, mean=mean, vol="Garch",
                             p=p, o=o, q=q, power=power, dist=dist)
    self.res = garch_model.fit()
    params = self.res.params

    # create forecast
    predict_length = self.horizon + self.lag
    self.fcst = self.res.forecast(horizon=predict_length,
                                  method="bootstrap", reindex=False, x=self.fcst_indep_data)
    test_fcst = self.res.forecast(params=params, horizon=predict_length,
                                  method="bootstrap", reindex=False, x=self.fcst_indep_data)

So my question is whether either of these approaches (or any other pre-existing approach) is correct and, if the params approach is correct, whether someone has insight on what I'm doing wrong.

bashtage commented 2 years ago

I think you could do something like:

from arch import arch_model

sim = arch_model(None)
actual = sim.simulate([0, 0.1, 0.1, 0.8],nobs=1000)
bootstrap = sim.simulate([0, 0.1, 0.1, 0.8],nobs=1000)

bs_mod = arch_model(bootstrap.data)
bs_res = bs_mod.fit()
actual_mod = arch_model(actual.data)
fcasts = actual_mod.forecast(bs_res.params, reindex=False)

Essentially you forecast using the parameters from the bootstrap sample. You can use the same actual_mod with different parameter estimates.

The only slight caveat has to do with the backcast value in the actual_mod. This is based on actual.data. If the sample size is reasonably long, then it should not make any difference.

msquaredds commented 2 years ago

Got it, I think that makes sense. I will play around with your setup with my data, appreciate it!

msquaredds commented 2 years ago

@bashtage I have one more question to make sure I'm understanding how this works, particularly how passing existing params works.

I set up an example similar to my code above (see below for the new code). I would've thought that passing the original model's params to itself when creating a forecast would give the same results as not passing params, but I'm getting different forecasts for all periods > 1 step ahead. So I'm trying to figure out what I don't understand here.

garch_model = arch_model(self.depen_data, vol="Garch", p=1, o=0, q=0)
self.res = garch_model.fit()
params = self.res.params

predict_length = 2
self.fcst1 = self.res.forecast(params=params, horizon=predict_length, method="bootstrap", reindex=False)
self.fcst2 = self.res.forecast(horizon=predict_length, method="bootstrap", reindex=False)

What I get for h=1 is 0.880092, but for h=2, fcst1 is 1.163483, while fcst2 is 1.160607.

bashtage commented 2 years ago

The bootstrap method uses random numbers so you need to make sure you reset the seed of the numpy random singleton generator. These only matter for horizon 2 or larger. For horizon 1 the forecasts are always the same.

msquaredds commented 2 years ago

Ahh that makes sense, tried that out and it works as expected now. Thanks!

mihailyanchev commented 1 year ago

Hi @bashtage @msquaredds I think I am running into a similar issue, but with two minor exceptions. Would you be able to give me an insight on how to handle it?

I would like to estimate a GARCH(1,1) model on training data and generate one-step ahead (out-of-sample) forecasts on a separate testing sample. One difference from the example above is that I am using an AR(1) for the mean model and think it is preferable to go for the analytical forecasts. Also, on the testing sample I would like to apply a rolling window scheme where I use Yt-1 to predict mean and variance for Yt over the whole sample.

I came up with something along these lines, which should give me the first forecast in the test sample, which uses the 2022-06-28 data point to generate a prediction for 2022-06-29. I assume I need to iterate over the sample with a for loop following this pattern (similar to how it is shown here), correct?

am = arch.arch_model(dataset.loc[:"2022-06-28"], mean='AR', lags=1, vol="Garch", p=1, o=0, q=1, dist="SkewStudent") 
res = am.fit()
params = res.params

test_model = arch.arch_model(dataset.loc["2022-06-28":])
forecasts = test_model.forecast(params, reindex=False, horizon=1)