GGiecold-zz / pyRMT

Python for Random Matrix Theory: cleaning schemes for noisy correlation matrices.
MIT License
75 stars 26 forks source link

Inverse Wishart regularisation #1

Closed lionel75013 closed 6 years ago

lionel75013 commented 6 years ago

After applying the algorithm to very small portfolio of ~15 assets, I noticed that the covariance is very underestimated. For instance, correlation between MSCI WLD and S&P 500 was negative (!!)

Mean absolute error on cross validation on my data was 0.41.

After reviewing the literature, most notably https://arxiv.org/pdf/1610.08104.pdf : see 8.1.2. Regularizing the empirical RIE,There exists a regularization technique called Invesrse Wishart, that would correct estimation error on the smallest eigenvalues: kappa=2*lambda_N/((1-q-lambda_N)**2-4*q*lambda_N) alpha_s=1/(1+2*q*kappa) denom=x/(1+alpha_s*(x-1.)) Gamma /= denom

After applying this technique, MAE on my sample fell to 0.381, and correlation between WLD and S&P went back to 0.73. Markowitz optimization was also improved.

Errors are still much worse than a regular empirical or scikit MinCovDet estimator.

Please advise.

GGiecold-zz commented 6 years ago

Hello.

The RIE estimator implemented in PyRMT is debiased for a typical stock market and for time series with low-recording frequency. For other instruments, e.g. futures, another heuristic correction would have to be explicitly evaluated and incorporated. Besides, with strong correlations, say between rolled future contracts, a more sophisticated approach would be required given that some eigenvalues in the correlation matrix are expected to be small and should not be washed away as noise.

Correlation cleaning is a dark art. Contributions to PyRMT are welcome.

With kind regards,

Gregory

On Wed, Jan 3, 2018 at 7:55 AM, lionel75013 notifications@github.com wrote:

After applying the algorithm to very small portfolio of ~15 assets, I noticed that the covariance is very underestimated. Mean absolute error on cross validation on my data was 0.41.

After reviewing the literature, most notably https://arxiv.org/pdf/1610. 08104.pdf http://url : see 8.1.2. Regularizing the empirical RIE

There exists a regularization technique called Invesrse Wishart, that would correct estimation error on the smallest eigenvalues: kappa=2*lambda_N/((1-q-lambda_N)*2-4qlambda_N) alpha_s=1/(1+2qkappa) denom=x/(1+alpha_s(x-1.)) Gamma /= denom

After applying this technique, MAE on my sample fell to 0.381, which is still a lot more than a regular scikit MinCovDet estimator would provide.

please advise.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GGiecold/pyRMT/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AK3j5372kU-y6DWOPE_KUbyWRV2akAHSks5tG3jZgaJpZM4RRscZ .

lionel75013 commented 6 years ago

Hi Gregory,

Thanks for your reply. Nice to exchange on the subject, as it is somewhat not very well known yet.

Contributions to PyRMT are welcome

I would not be comfortable to push anything as I have no experience on Github. Besides you are the owner here :-)

The correction for small eigenvalues (called 'IW') is prescribed by the document I previously linked which is an extension of their 2016 Risk.net paper. I would advise you integrate it in the code (as an optional parameter maybe ?), as it is a 4 liner.

But even after integrating this numerical regularization, i still had unrealistic covariance and precision matrices. After some more digging, it appears that the shrinkage of the "market mode" eigenvalue, which is about ten times bigger that the other is to blame. The algorithm crushes it, so that it stays in the bulk of the other eigenvalues. As a smoke test, I have played with a version that preserves the market mode, and it removed much of the introduced bias.This was somewhat of a disappointment because that makes it non applicable to my problem. I have found no further research from joel Bun on this subject, and i fear further development might become proprietary.

I have some success to report on this area though, as your code is very useful as a basis for different empirical spectrum distribution cleaning scheme. As a matter of fact, Ledoit & Wolf have proposed a much simpler cleaning scheme than their QueST alogrithm, in their Sep 2017 paper Direct Nonlinear Shrinkage Estimation of Large-Dimensional Covariance Matrices

They provide a Matlab code that i have ported to python.

    lmbda = eigvals.T
    h = math.pow(T, -0.35)
    h2 = h ** 2
    L = matlib.repmat(lmbda, N, 1)
    Lp = L.conj().transpose()
    square_lp = h2 * (Lp ** 2)

    ftilde = np.mean(np.sqrt((4 * square_lp - (L - Lp) ** 2).max(0)) / 2 * np.pi * square_lp, axis=1)
    Hftilde = np.mean((np.sign(L - Lp) * np.sqrt(((L - Lp) ** 2 - 4 * square_lp).max(0)) - L + Lp) / (2 * np.pi * square_lp), axis=1)

    if N <= T:
        dtilde = lmbda / ((np.pi * q * np.dot(lmbda, ftilde)) ** 2 + (1 - q - np.pi * q * np.dot(lmbda, Hftilde)) ** 2)
    else:
        Hftilde0 = (1 - np.sqrt(1 - 4 * h2)) / (2 * np.pi * h2) * np.mean(1. / lmbda)
        dtilde0 = 1 / (np.pi * (N - T) / T * Hftilde0)
        dtilde1 = lmbda / ((np.pi ** 2) * (lmbda ** 2) * (ftilde ** 2 + Hftilde ** 2))
        dtilde = np.concatenate(np.dot(dtilde0, np.ones(N - T, 1, np.float)), dtilde1)

    dhats = pav(dtilde)

using a port for the PAV algorithm from Alexandre Gramfort.

This performs extremely well for my setup. This could easily be included in pyRMT, if you are willing to vet this proposal and provide insight on the code.

Anyway, thanks again for having provided this open source code. It is very much appreciated !

GGiecold-zz commented 6 years ago

Hello Joel,

Thank you for the thoughtful message.

PyRMT is an open-source project, with no claim of ownership. I have veered off to different endeavors and your contribution would be most welcome, all the more that you are clearly very well acquainted with the latest research literature.

Regarding the shrinkage of the market mode, it is indeed expected. I would recommend subtracting a relevant index from each of the time series representing the instruments underlying your portfolio. A broad market index would be a first step. It is also important to check for non-stationarity or covariate shift. Fitting a regime-switching model and separately applying correlation cleaning to each identified period is an avenue worth considering (and lots of fun!).

PyRMT is awaiting your commit of a Python implementation of the direct nonlinear shrinkage estimator by Ledoit and Wolf.

All the best wishes,

Gregory

On Tue, Jan 9, 2018 at 9:03 AM, lionel75013 notifications@github.com wrote:

Hi Gregory,

Thanks for your reply. Nice to exchange on the subject, as it is somewhat not very well known yet.

Contributions to PyRMT are welcome I would not be comfortable to push anything as I have no experience on Github. Besides you are the owner here :-)

The correction for small eigenvalues (called 'IW') is prescribed by the document I previously linked which is an extension of their 2016 Risk.net paper https://www.cfm.fr/assets/ResearchPapers/Cleaning-Correlation-Matrices.pdf. I would advise you integrate it in the code (as an optional parameter maybe ?), as it is a 4 liner.

But even after integrating this numerical regularization, i still had a very divergent covariance and precision matrix. After some more digging, it appears that the shrinkage of the "market mode" eigenvalue, which is about ten times bigger that the other is to blame. The algorithm crushes it, so that it stays in the bulk of the other eigenvalues. As a smoke test, I have played with a version that preserves the market mode, and it removed much of the introduced bias.This was somewhat of a disappointment because that makes it non applicable to my problem. I have found no further recherche from joel Bun on this subject, and i fear this will be cloaked as a hedge fund 'secret'...

I have some success to report on this area though, as your code is very useful as a basis for different empirical spectrum distribution cleaning scheme. As a matter of fact, Ledoit & Wolf have proposed a much simpler cleaning scheme than their QueST alogrithm, in their Sep 2017 paper Direct Nonlinear Shrinkage Estimation of Large-Dimensional Covariance Matrices https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3047302

They provide a Matlab code that i have ported to python.

lmbda = eigvals.T
h = math.pow(T, -0.35)
h2 = h ** 2
L = matlib.repmat(lmbda, N, 1)
Lp = L.conj().transpose()
square_lp = h2 * (Lp ** 2)

ftilde = np.mean(np.sqrt((4 * square_lp - (L - Lp) ** 2).max(0)) / 2 * np.pi * square_lp, axis=1)
Hftilde = np.mean((np.sign(L - Lp) * np.sqrt(((L - Lp) ** 2 - 4 * square_lp).max(0)) - L + Lp) / (2 * np.pi * square_lp), axis=1)

if N <= T:
    dtilde = lmbda / ((np.pi * q * np.dot(lmbda, ftilde)) ** 2 + (1 - q - np.pi * q * np.dot(lmbda, Hftilde)) ** 2)
else:
    Hftilde0 = (1 - np.sqrt(1 - 4 * h2)) / (2 * np.pi * h2) * np.mean(1. / lmbda)
    dtilde0 = 1 / (np.pi * (N - T) / T * Hftilde0)
    dtilde1 = lmbda / ((np.pi ** 2) * (lmbda ** 2) * (ftilde ** 2 + Hftilde ** 2))
    dtilde = np.concatenate(np.dot(dtilde0, np.ones(N - T, 1, np.float)), dtilde1)

dhats = pav(dtilde)

using a port for the PAV algorithm port from Alexandre Gramfort. https://gist.github.com/fabianp/3081831

This performs extremely well for my setup. This could easily be included in pyRMT, if you are willing to vet this proposal and provide insight on the code.

Anyway, thanks again for having provided this open source code. It is very much appreciated !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GGiecold/pyRMT/issues/1#issuecomment-356292269, or mute the thread https://github.com/notifications/unsubscribe-auth/AK3j5wmro0LiyN44sAZ0M621FC2JXPjCks5tI3GXgaJpZM4RRscZ .

GGiecold-zz commented 6 years ago

Sorry, I meant lionel75013, instead of Joel! :-/

Gregory

On Tue, Jan 9, 2018 at 9:03 AM, lionel75013 notifications@github.com wrote:

Hi Gregory,

Thanks for your reply. Nice to exchange on the subject, as it is somewhat not very well known yet.

Contributions to PyRMT are welcome I would not be comfortable to push anything as I have no experience on Github. Besides you are the owner here :-)

The correction for small eigenvalues (called 'IW') is prescribed by the document I previously linked which is an extension of their 2016 Risk.net paper https://www.cfm.fr/assets/ResearchPapers/Cleaning-Correlation-Matrices.pdf. I would advise you integrate it in the code (as an optional parameter maybe ?), as it is a 4 liner.

But even after integrating this numerical regularization, i still had a very divergent covariance and precision matrix. After some more digging, it appears that the shrinkage of the "market mode" eigenvalue, which is about ten times bigger that the other is to blame. The algorithm crushes it, so that it stays in the bulk of the other eigenvalues. As a smoke test, I have played with a version that preserves the market mode, and it removed much of the introduced bias.This was somewhat of a disappointment because that makes it non applicable to my problem. I have found no further recherche from joel Bun on this subject, and i fear this will be cloaked as a hedge fund 'secret'...

I have some success to report on this area though, as your code is very useful as a basis for different empirical spectrum distribution cleaning scheme. As a matter of fact, Ledoit & Wolf have proposed a much simpler cleaning scheme than their QueST alogrithm, in their Sep 2017 paper Direct Nonlinear Shrinkage Estimation of Large-Dimensional Covariance Matrices https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3047302

They provide a Matlab code that i have ported to python.

lmbda = eigvals.T
h = math.pow(T, -0.35)
h2 = h ** 2
L = matlib.repmat(lmbda, N, 1)
Lp = L.conj().transpose()
square_lp = h2 * (Lp ** 2)

ftilde = np.mean(np.sqrt((4 * square_lp - (L - Lp) ** 2).max(0)) / 2 * np.pi * square_lp, axis=1)
Hftilde = np.mean((np.sign(L - Lp) * np.sqrt(((L - Lp) ** 2 - 4 * square_lp).max(0)) - L + Lp) / (2 * np.pi * square_lp), axis=1)

if N <= T:
    dtilde = lmbda / ((np.pi * q * np.dot(lmbda, ftilde)) ** 2 + (1 - q - np.pi * q * np.dot(lmbda, Hftilde)) ** 2)
else:
    Hftilde0 = (1 - np.sqrt(1 - 4 * h2)) / (2 * np.pi * h2) * np.mean(1. / lmbda)
    dtilde0 = 1 / (np.pi * (N - T) / T * Hftilde0)
    dtilde1 = lmbda / ((np.pi ** 2) * (lmbda ** 2) * (ftilde ** 2 + Hftilde ** 2))
    dtilde = np.concatenate(np.dot(dtilde0, np.ones(N - T, 1, np.float)), dtilde1)

dhats = pav(dtilde)

using a port for the PAV algorithm port from Alexandre Gramfort. https://gist.github.com/fabianp/3081831

This performs extremely well for my setup. This could easily be included in pyRMT, if you are willing to vet this proposal and provide insight on the code.

Anyway, thanks again for having provided this open source code. It is very much appreciated !

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GGiecold/pyRMT/issues/1#issuecomment-356292269, or mute the thread https://github.com/notifications/unsubscribe-auth/AK3j5wmro0LiyN44sAZ0M621FC2JXPjCks5tI3GXgaJpZM4RRscZ .

lionel75013 commented 6 years ago

Hi Gregory,

I understand you are busy and can't spend time on the project, but i have checked-out the git repository with SVN and made some changes that i would be willing to commit. I am new to this, but it seems i don't have the rights to make any changes. If you have the time, could you grant me the rights ?

lionel75013 commented 6 years ago

spoken to soon. After forking, a pull request has been made. Looking forward to your insight

GGiecold-zz commented 6 years ago

I have merged your pull request. Thank you for your contribution!

GGiecold-zz commented 6 years ago

Hello Lionel,

I have brought a few, minor changes to your merged pull request, namely:

Your name has also been added to the author field in pyRMT.py and to the README.md file.

With best regards,

Gregory

On Fri, Jan 12, 2018 at 6:04 AM, Lionel Ouaknin notifications@github.com wrote:

spoken to soon. After forking, a pull request has been made. Looking forward to your insight

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GGiecold/pyRMT/issues/1#issuecomment-357210046, or mute the thread https://github.com/notifications/unsubscribe-auth/AK3j54fRUFianXcWR2nsTQbJ2_R77fThks5tJzxDgaJpZM4RRscZ .

lionel75013 commented 6 years ago

Thanks Gregory! I will continue to work with this code and do some more testing. I am sure with time some improvement/fixes will have to be made.

Regards,