JiaxiangBU / tutoring2

The collection of Python and R code scripts to tutor others.
https://jiaxiangbu.github.io/tutoring2/
Other
8 stars 7 forks source link

factor rotations #41

Closed superlianer closed 4 years ago

superlianer commented 4 years ago

在PCA中,什么是normalized squared loading?rotation是不是相当于得到每个主成分(比如component 1)的组成,是哪个因素构成的,每个因素的系数是多少?rotation出来的数据跟不rotation得到的系数有相似但有一定的差距。

JiaxiangBU commented 4 years ago

@superlianer

The third step involves the rotation of factors. Rotation is a standard step in factor analysis. It provides a criterion for eliminating the indeterminacy implicit in factor analysis results. (Capelle-Blancard et al. 2019)

这个处理叫做 factor rotations。目的是把一个 PCs 内较弱的(比如绝对值较小)的变量删除。

The rotation changes the factor loadings and consequently the interpretation of the factors, but the different factor analytical solutions are mathematically equivalent in that they explain the same portion of the sample variance.

当然这样的处理肯定是会改变主成分的解读。

var_name PC1_unrotated PC2_unrotated PC3_unrotated PC1_rotated PC2_rotated PC3_rotated
X_ 7 0.5281055 0.2460877 0.5440660 0.0000000 0.0000000 0.5440660
X_ 4 0.8830174 0.5726334 0.9942698 0.0000000 0.0000000 0.9942698
X_ 3 0.4089769 0.6775706 0.6405068 0.0000000 0.6775706 0.0000000
X_ 6 0.0455565 0.8998250 0.7085305 0.0000000 0.8998250 0.0000000
X_ 10 0.4566147 0.9545036 0.1471136 0.0000000 0.9545036 0.0000000
X_ 1 0.2875775 0.9568333 0.8895393 0.0000000 0.9568333 0.0000000
X_ 9 0.5514350 0.3279207 0.2891597 0.5514350 0.0000000 0.0000000
X_ 2 0.7883051 0.4533342 0.6928034 0.7883051 0.0000000 0.0000000
X_ 8 0.8924190 0.0420595 0.5941420 0.8924190 0.0000000 0.0000000
X_ 5 0.9404673 0.1029247 0.6557058 0.9404673 0.0000000 0.0000000

Factor rotation is obtained using the varimax method, which attempts to minimize the number of variables that have high loadings (so-called salient loadings) on the same factor. (Capelle-Blancard et al. 2019)

以上就是 varimax 处理方式,通过处理后,我们发现系数较小的变量在 PCs 进行了移除,达到了 vari(able) - max 的目的,只保留最高的 loadings。

This is a transformation of factorial axes that makes it possible to approximate a “simple structure” of the factors, in which each indicator is “loaded” exclusively on one of the retained factors. This enhances the interpretability of these factors. (Capelle-Blancard et al. 2019)

当然类似于 Lasso,我们让每个 PCs 拥有更少的变量,实现简单结构,同时也让每个变量只出现在某个 PCs,更加体现了 PCA 的“正交化”,增加 PCs 的可解释性。 不会出现一个变量决定多个变量,这样难以解释。

Capelle-Blancard, Gunther, Patricia Crifo, Marc-Arthur Diaye, Rim Oueghlissi, and Bert Scholtens. 2019. “Sovereign Bond Yield Spreads and Sustainability: An Empirical Analysis of Oecd Countries.” *Journal of Banking & Finance* 98: 156–69. .