Loadings - Githubissues

bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4

https://bcm-uga.github.io/pcadapt

39 stars 10 forks source link

Loadings #59

Closed mbrieuc closed 3 years ago

mbrieuc commented 3 years ago

Hi, I'm new to PCAdapt. I've been running it on a large dataset and was able to identify outlier loci. However when I try to look at the loadings for these outliers, most of them are NA. What could be causing that? Are the loci in pcadapt_object$pvalue not in the same order as pcadapt_object$loadings? Thanks a lot for your help Best, Marine

privefl commented 3 years ago

Are you using some clumping? The variants removed from clumping won't be in PCA then.

mbrieuc commented 3 years ago

Thanks for the quick reply. I am using clumping. But I would expect that the variants removed wouldn't have a pvalue either then? But in my case there is a mismatch between loci with a pvalue and loci with loadings.

privefl commented 3 years ago

You do get the final stats for the ones removed from clumping. The last step if basically running a GWAS (for all variants) on the PC scores and combining the Z-Scores you get for each PC.

mbrieuc commented 3 years ago

Okay great. Then I should look at the zscores for each locus and not the loadings, correct?

privefl commented 3 years ago

What information do you want?

mbrieuc commented 3 years ago

I'm trying to figure out which loci are associated with which PC and the relative importance of each locus on the different PCs.

privefl commented 3 years ago

If you want the correlation for each PC / each variant, I think you can use the fact that Z^2 = n R^2.

mbrieuc commented 3 years ago

Thanks. Here n=? Sorry for all the questions, but your answers are really helpful :)

privefl commented 3 years ago

Sample size / number of individuals.

akhrunin commented 3 years ago

Am I right in thinking that in the last version of the software the loadings are not used for searching the outliers?

privefl commented 3 years ago

Loadings are first transformed to Z-Scores, which are then combined into one vector of chi-squared with a robust Mahalanobis distance. To the best of my knowledge, this has been like this for quite some time (years). You can look at the paper describing version 4 (https://doi.org/10.1093/molbev/msaa053) to see what we have (not) changed.

akhrunin commented 3 years ago

Thanks! Sorry for so simple question!