Unsupervised machine learning Principal Component Analysis (PCA) on the Dow Jones Industrial Average index and it's respective 30 stocks to construct an optimized diversified intelligent portfolio.
I am just wondering why do you fit pca on the covariance matrix instead of the original returns?
According to the original paper(page 7), the author fits eigen decomposition on the covariance. In your code, using pca to fit will result in computing covariance twice and then do eigen decomposition, which I think will be a problem.
In your ml.py file, line 67 you have the following:
Creating covariance matrix and training data on PCA.
cov_matrix = X_train.loc[:,X_train.columns != 'DJIA'].cov() pca = PCA() pca.fit(cov_matrix)
I am just wondering why do you fit pca on the covariance matrix instead of the original returns? According to the original paper(page 7), the author fits eigen decomposition on the covariance. In your code, using pca to fit will result in computing covariance twice and then do eigen decomposition, which I think will be a problem.