Closed mbrieuc closed 3 years ago
Are you using some clumping? The variants removed from clumping won't be in PCA then.
Thanks for the quick reply. I am using clumping. But I would expect that the variants removed wouldn't have a pvalue either then? But in my case there is a mismatch between loci with a pvalue and loci with loadings.
You do get the final stats for the ones removed from clumping. The last step if basically running a GWAS (for all variants) on the PC scores and combining the Z-Scores you get for each PC.
Okay great. Then I should look at the zscores for each locus and not the loadings, correct?
What information do you want?
I'm trying to figure out which loci are associated with which PC and the relative importance of each locus on the different PCs.
If you want the correlation for each PC / each variant, I think you can use the fact that Z^2 = n R^2.
Thanks. Here n=? Sorry for all the questions, but your answers are really helpful :)
Sample size / number of individuals.
Am I right in thinking that in the last version of the software the loadings are not used for searching the outliers?
Loadings are first transformed to Z-Scores, which are then combined into one vector of chi-squared with a robust Mahalanobis distance. To the best of my knowledge, this has been like this for quite some time (years). You can look at the paper describing version 4 (https://doi.org/10.1093/molbev/msaa053) to see what we have (not) changed.
Thanks! Sorry for so simple question!
Hi, I'm new to PCAdapt. I've been running it on a large dataset and was able to identify outlier loci. However when I try to look at the loadings for these outliers, most of them are NA. What could be causing that? Are the loci in pcadapt_object$pvalue not in the same order as pcadapt_object$loadings? Thanks a lot for your help Best, Marine