Closed mehakmadhura closed 7 months ago
Could you give actual numbers and plots
Hii! Thank you for your reply. Here are the scree plots with and without clumping.
The proportion of explained variance is lower in the first plot with clumping.
I guess we expect some minor drop in variance explained, just because there are less variables used. However, I do not think we expect such a large drop indeed, especially if population structure is captured.
I think it keeps around 0.6 of the variants. Here are the PC scores for clumping and no clumping
Also, when we decide on a k, for example, k=3. does pcadapat perform a PCA again with k-3, or just select the first 3 pcs for computation?
It keeps 60% of variants? Or 0.6%?
The PC scores seem highly similar, which is expected. So I don't get why there is such a large difference in the proportion of variance explained; it may simply due to an error in the way it is computed.
pcadapt does the computation from scratch every time, bu it should be quite fast.
Thank you for your replies. It keeps 60% of the variants. Also, does it show such a decrease in the variance explained for other datasets as well?
These are the results from the tutorial:
So, I guess, yes.
I'll try to see if there is better way to estimate these.
Thank you so much!
I've implemented a better estimate of the total variance to get better estimates of the proportions of explained variance. But I still get very similar results for the tutorial.
Can you try the latest GitHub version on your data?
Hii! Thanks for the update. I tried the new version on my dataset. I have attached the screeplots with and without clumping.
What do you get for sum(!is.na(x$loadings[, 1])) / length(x$pass)
?
0.03581412 with clumping and 1.050497 without clumping. Also, this is with k=10. I didn't choose for a K yet.
Okay, this should be the percentage of variants (that passed the MAF threshold) kept after clumping. So this is a very small percentage that is kept. This may explain the results. I would have expected you would get 1 without the clumping however.
Oh! So is the fraction of explained variance by the PCs with respect to the variance of whole data, and not with respect to just the subset left after clumping?
Yes, the total variance is computed from the data after the MAF threshold.
Okay. Thank you so much for your prompt replies!
Are you happy with this? Should we close this issue?
Yes Sure! Thank you so much.
Hii, I had a question about LD clumping. I observed that in a PCA done with LD clumping, the percentage of variance explained by the PCs is much lower than that explained by PCs without clumping. Can you please explain the reason?