bashtage / arch

ARCH models in Python
Other
1.29k stars 247 forks source link

volatility forecast in comparison with realized volatility #701

Open maryam1986safari opened 7 months ago

maryam1986safari commented 7 months ago

I'm trying to model returns with an ARIMA-GARCH. when I compare one step forecasts for 30 days (of test set) with realized volatility, I find there is a drift between two line-plot:

frcst = result.forecast(horizon=1, start=splitDate, method="simulation", simulations=30 *50) frcst_variance=frcst.variance .squeeze() realized_volatility= realized_volatility(n_period=30)

fig, ax = plt.subplots(figsize= (12,6)) ax.plot(np.sqrt(frcst_variance.sort_index()), "red" ,realized_volatility.sort_index(), "blue",linewidth =0.5)

Untitled

it sounds the forecast is accurate unless the drift which has occurred. why this has occurred?

bashtage commented 7 months ago

Could you post more of your code? I isn't clear what result is coming from.

maryam1986safari commented 7 months ago

resid_predict = return[-splitDate: ] - ARIMA_model.predict (start=splitDate, end=end_date)

y= pd.concat ([ ARIMA_model.resid.dropna() , resid_predict ])

model_set= arch_model(y=y, mean = 'zero', vol= 'GARCH', dist='t', p=1 , o=0, q=1)

result= model_set.fit(disp = 'off', last_obs=splitDate)

frcst = result.forecast(horizon=1, start=splitDate, method="simulation", simulations=30 *50) frcst_variance=frcst.variance .squeeze() realized_volatility= realized_volatility(n_period=30)

fig, ax = plt.subplots(figsize= (12,6)) ax.plot(result.conditional_volatility, "green", np.sqrt(frcst_variance), "red" , realized_volatility, "blue")

image

if I set (7,7) for orders of GARCH then:

image

maryam1986safari commented 7 months ago

I have another related question. it would be great if it got clear to me. I just change the length of out-of-sample set (test set). it was 30 (days) and I changed it to 5. then in-sample prediction of model which is in fact the conditionally variance again is illustrated by green line. It seems there has been no change in in-sample perdition in last 25 days of train set (in comparison to when I had set the length of test set equal to 30). is this can be considered as evidence that modeling process is taking the test set into the account when the model is building? image