TommasoBelluzzo / HistoricalVolatility

A framework for historical volatility estimation and analysis.
Apache License 2.0
34 stars 15 forks source link

Correctness of Yang-Zhang volatility calculation #2

Closed nateaff closed 6 years ago

nateaff commented 6 years ago

Hello Tommaso,

I was testing your implementation of Yang-Zhang volatility against the TTR implementation and found a discrepancy between the two outputs. For test data, I used daily SPY OHLC data from 2015-11-25 to 2015-12-11. The results of the TTR and HistoricalVolatility implementation are included below.

HistoricalVolatility:

estimate_volatility(SPY,'YZ',10);
>>
    0.1541
    0.1749

TTR:

yz_vol <- TTR::volatility(SPY, N = 252, calc = "yang.zhang")
yz_vol[11:12]

> 
[1] 0.1455561 0.1603981

For the following I'm using the notation and reference equations for Yang-Zhang volatility included in the TTR reference documents here: https://www.rdocumentation.org/packages/TTR/versions/0.23-3/topics/volatility

The calculations in your estimate_volatility function diverge from the reference equations in two places. Yang-Zhang volatility is the weighted composition of three variance components:

sigma_rs = sigma_o^2 + k*sigma_c^2 + (1-k)*sigma_rs^2.

The three vectors used to calculate each variance component occurs on line 116:

res = [(log(data.Open ./ [NaN; data.Close(1:end-1)]) .^ 2) (data.Return .^ 2) ((ho .* (ho - co)) + (lo .* (lo - co)))];

The values (data.Return.^2) corresponds to the values used to calculatesigma_c^2. The Yang-Zhang volatility functions in the TTR documentation, however, use the log Close/Open instead of returns. Specifically,

sigma_c^2 = N/(n-1) * sum( log( C_i/O_i - mu_c )^2)

where C_i is the closing price and O_i is the opening price.

The other discrepancy from the reference equations occurs at line 118 of estimate_volatility:

param1 = sqrt(252 / (bw - 1));

Here bw corresponds to bandwidth or the size of the rolling window used to compute volatility and 252 is the number of active trading days.

The parameter param1 is used in the summary function @fun to compute each variance component used in the Yang-Zhang formula: sigma_o^2, sigma_c^2, sigma_{rs}^2. However, the Roger-Satchell formula uses 1/n not 1/(n-1). The value n corresponds to the bandwidth parameter,bw, in estimate_volatility function.

I forked this repository and updated the code to match the TTR reference equations. These changes are located at lines 125-198 in my version of the _estimatevolatility.m file. The output of the updated estimate_volatility function now matches the TTR::volatility() output.

estimate_volatility(SPY,'YZ',10);
yz_vol =
>>
    0.1456
    0.1604

I just wanted to let you know about the discrepancy and open the issue as a reference in case anyone else might find it useful. My fork of the repository with the changes described above is located here.

Regards,

Nathanael

TommasoBelluzzo commented 6 years ago

Dear Nathanael,

it's with great pleasure I read this comment to my code. I apologyze for my late reply, but my job kept far from my laptop for a while. I totally agree with you when you say that my YZ estimator implementation was faulty and I released a little fix into the new version of my code.

Despite your version being a huge improvement over my previous version, after reviewing the formulation of the YZ estimator from different sources, I think that the code should be rewritten differently.

Sticking to the formulation proposed in the link you provided, I see that there are three parameters derived from the scalar k (1, k and 1-k) and three weighting parameters (N / (n - 1), N / (n - 1) again and (N / n)) for, respectively, sigma_o^2, sigma_c^2 and sigma_rs^2. I implemented both parameters into a one-shot anonymous function call instead of using two function handlers.

For what concerns the sigma_rs^2, I decided not to change my original formulation (ho .* (ho - co)) + (lo .* (lo - co)) because, when compared to your formulation (ho .* hc) + (lo .* lc), I noticed it basically leads to the same output. So both formulations are totally correct and yet divergent from the one you normally find in most academic papers (which is also the one described in the R documentation).

For what concerns sigma_o^2 and sigma_c^2, I tried to avoid using your var implementation and I stick on a more standard approach.

On the top of that, I updated my data fetching from the web which was completely broken due to a downgrade in the Quandl service.

The results between your implementation and mine are now almost identical. Please, try it and report back to me in case your are detecting further issues.

Best regards, Tommaso