bashtage / arch

ARCH models in Python
Other
1.34k stars 250 forks source link

simulate function in GARCH bugged? #62

Closed fermionportal closed 9 years ago

fermionportal commented 9 years ago

In volatility.py, in GARCH class, function simulate, the following code: sigma2[:max_lag] = initial_value * inv_power data[:max_lag] = sqrt(sigma2[:max_lag]) \ sign(errors[:max_lag]) seems to be taking the square root of the initial_value twice, because inv_power == 0.5 by default.

In any case, this shouldn't be like that, because implicitly, it is assuming that the previous error was equal to 1 or -1, and then sqrt of sigma2 will be a huge number, hence the first return (residual), in data, will be on average huge. It should replace the first max_lag members in errors variable with previous returns (residuals) instead.

This becomes significant if burn=0, which is the case for several practicals applications of the simulate function.

I think this is a bug, but apologies if I misunderstood the logic.

bashtage commented 9 years ago

I think you have it right.

https://github.com/bashtage/arch/blob/9da8ea9028842e717db0aa2ced54545f8399ad49/arch/univariate/volatility.py#L520

Also shows that it should be 2.0/power not 1.0/power when making this transformation.

fermionportal commented 9 years ago

What about the line: data[:max_lag] = sqrt(sigma2[:max_lag]) * sign(errors[:max_lag])

should there really be the sign() function on errors?

bashtage commented 9 years ago

This is to backcast the data to +/- sigma2. It isn't possible to directly use errors since they do not follow a GARCH process. Essentially this means that in a GJR-GARCH(1,1,1) the the evolution of the first fake data point will just be

sigma2[1] = omega + alpha * sigma2[0] + gamma * sigma2[0] * (errors[0]<0) + beta *sigma2[0]

You should always burn at least max([p,o,q]) data points, and in practice many more. I don't understand a case where you would want to burn 0.

fermionportal commented 9 years ago

You burn 0 when you want to simulate returns/sigmas over the next N days, starting from today.

Also, errors in the above case are supposed to be i.i.d, and not follow a GARCH process. For example, in the EGARCH simulate() function, it correctly initializes the data to sqrt(sigma2) * errors. In case of pure GARCH model, the above equation will give massively outsized first returns.

bashtage commented 9 years ago

simulate does not do what you want - in some special cases, e.g. a GARCH(1,1), you can almost get it to do this by correctly choosing initial_value, but for more general models, e.g. a GARCH(2,2) you cannot use simulate to simulate out-of-sample values. since the simulation code would need both initial_values for sigma2 as well as the final two values of epsilon to correctly simulate this. I suppose this is more of a feature request since it is not what is intended and would require non-trivial modification. It would probably make more sense to add a new method that could simulate starting from some point in the actual sample, i.e. simulate_from_data, which would use the fitted model. This is closer to a forecast method IMO than straight simulation of a process, which is what these methods were designed for.

This said, there is no reason not to accept your suggestion since both

sqrt(sigma2[:max_lag]) * errors[:max_lag]

and

sqrt(sigma2[:max_lag]) * sign(errors[:max_lag])

do not matter when burn is reasonably large, so no reason not to have the former since it can be used in either ARCH(1) or GARCH(1,1) models to simulate from the end of a sample (I believe these are the only 2 where it would be correct).

fermionportal commented 9 years ago

Thanks for the comments Kevin. Indeed, simulate_from_data is what I am more used to and the current simulate is not too familiar to me. MATLAB's garchsim() follows the simulate_from_data approach. I'm close to finishing the simulate_from_data for a few GARCH models, but I'm not yet sure it would fit into the current arch package design, hence don't know if I should submit the code.