Rosemeis / pcangsd

Framework for analyzing low depth NGS data in heterogeneous populations using PCA.
GNU General Public License v3.0
46 stars 11 forks source link

PC loadings? #43

Closed TeresaPegan closed 3 years ago

TeresaPegan commented 3 years ago

Hello, Is there a way to get some kind of summary of PC loadings from PCANGSD? Or information about which sites are contributing the most strongly to the patterns? When I run PCANGSD on some of my data, I get an unexpected strong pattern in PC1 and I am trying to figure out what is causing it. It would be very helpful to know if the pattern is being caused by particular sites and/or a particular region of the chromosome. Thanks! -Teresa

Rosemeis commented 3 years ago

Hi Teresa,

Yeah there is a non-documented feature in PCAngsd that you can turn on. :-) "-snp_weights"

Best, Jonas

TeresaPegan commented 3 years ago

Amazing! Thank you!

Just to make sure I understand the numbers properly: If I have 3 example SNPs with weights 0.04, 0, and -0.04, where 0.04 and -0.04 are close to the max and min weight in the whole dataset, my interpretation is that SNP 1 and SNP 2 are both contributing strongly to the pattern, but in opposite directions, and that SNP 2 with a weight of 0 is not contributing to the pattern?

Also, how do these weights apply to the different PCs? If a SNP has a strong weight, does that mean that it is contributing strongly to PC1, or might it be responsible for something in PC2 or one of the other PCs?

Thanks! -Teresa

Rosemeis commented 3 years ago

Yeah exactly!

In the output each column will correspond to a different PC, where the number of columns is ether determined by PCAngsd or you can set it manually with the "-e" parameter. So If you only have one column then that is the SNP weights for only PC1.

Best, Jonas

TeresaPegan commented 3 years ago

Thanks!