Closed jpfontenelle closed 2 years ago
Yes, visualizing both the scree plot and the score plots is the way I would recommend for choosing K
.
Choosing K
programmatically is a hard problem.
When p > n, I would recommend reading this paper, whose method is implemented in R package hdpca. I've tried it in the past, it doesn't give the perfect K
, but something often close enough. But you can't have too many individuals in the PCA because this method requires to compute all eigenvalues.
Hello. Thank you for the reply. I will definitely check the reference out. It might work on my case. Cheers
Hello everyone.
I have a question that is similar to previous issues posted here and are already closed.
Using pcadapt to identify the optimal number of principal components (K) is part of a pipeline of simulations I am running. The use of scree plots and score plots work well, it is "easy" to choose K based on the graphical representation.
However, my simulations have many replicates, which would mean > thousands of scree/score plots to inspect.
I wonder if there is any threshold that could be used to "select" K values without the need of the graphical interface.
I ran into this approach called Angle Distribution of Loading Subspaces (ADLS), that seems promising. However, my math skills are not good enough to code for this based on a pcadapt object.
From that line of thought I wonder if I could use the difference between singular.values. For example, would it be adequate to compare pcadapt.obj$singular.value[i] - pcadapt.obj$singular.value[i+1], pcadapt.obj$singular.value[i+1] - pcadapt.obj$singular.value[i+2], etc, and if the result is smaller than a number it would mean the "elbow" of the scree plot? Or is that too off?
Any ideas?
Thank you very much