Closed marce-herrera closed 2 months ago
What do you see on the first two PCs?
Hi,
Thank you for getting back so fast!
As you can see below, there's no structure..
This is the screeplot:
Then I guess you won't get much from pcadapt. Is the Manhattan plot flat?
Hi there,
See the plots below for k = 10
Thanks!
The PCA here probably captures LD structure only; you need to use some clumping.
Hello,
I did the LD clumping (as shown in the tutorial) and this is what I get:
This is for k=10 (same as before):
I'm not really sure how to interpret this. Any advice?
Thank you!
I would interpret this as: you need even more clumping (smaller r2 threshold). If you still get something similar, I think you can give up.
Hi,
So this time I used LD.clumping = list(size = 200, thr = 0.1)
Do you think doing size=500 or size=1000 will make a difference? by smaller r2 threshold you mean ...? thr = 0.05, thr = 0.01?
Thanks!
It seems you have very long-range LD regions. You can try e.g. size = 1e5 and thr = 0.02.
Hello,
Here's an update:
After reviewing my progress, I realized that I had not performed LD pruning on my SNP dataset. I went back and completed this step using PLINK.
Afterward, I ran PCAdapt and obtained the following results:
x <- pcadapt(input = filename, K = 20) plot(x, option = "screeplot")
poplist.names <- c(rep("lagoon", 20), rep("ocean", 20)) plot(x, option = "scores", pop = poplist.names)
par(mfrow = c(2, 2)) for (i in 1:4) plot(x$loadings[, i], pch = 19, cex = .3, ylab = paste0("Loadings PC", i))
Based on the first screeplot I chose k = 5: x <- pcadapt(filename, K = 5) plot(x , option = "manhattan") (You can still see a peak, but it's much better than it was before I did the pruning.)
plot(x , option = "qqplot")
hist(x$pvalues, xlab = "p-values", main = NULL, breaks = 50, col = "orange")
plot(x, option = "stat.distribution")
Because I still have high LD I did LD clumping as follows: newx <- pcadapt(filename, K = 20, LD.clumping = list(size = 200, thr = 0.1)) plot(newx, option = "screeplot")
newx <- pcadapt(filename, K = 5, LD.clumping = list(size = 200, thr = 0.1))
par(mfrow = c(2, 2)) for (i in 1:4) plot(newx$loadings[, i], pch = 19, cex = .3, ylab = paste0("Loadings PC", i))
plot(newx , option = "manhattan")
plot(newx , option = "qqplot")
hist(newx$pvalues, xlab = "p-values", main = NULL, breaks = 50, col = "orange")
plot(newx, option = "stat.distribution")
I then get ~1200 outliers: padj <- p.adjust(newx$pvalues,method="BH") alpha <- 0.05 bh_outliers <- which(padj < alpha) length(bh_outliers) [1] 1216
Do you think this approach is correct? Am I right in understanding that these 1,216 outliers are adaptive loci? What does this imply even if I don't initially observe population structure? How can I retrieve these outliers? I am interested in analyzing them further with SNP effect to see the functional effect...
Thank you again!!
No, I'm afraid you're still capturing only LD. There is no visible population structure.
Right, okay. Just to clarify, is PCAdapt only useful if there is population structure? If so, I cannot use it with this dataset... correct?
Yes x2
Ok, thank you!
One last question (sorry if it seems simple, I'm new to population genetics). Can you recommend other tools for detecting outliers? Is it worth pursuing this if I don't have population structure data? My reasoning is that I might not have enough differences to detect population structure, but I could still find differences at certain loci, which I would be interested in exploring.
Thanks again!
I'm not really an expert in popgen either, sorry.
Hello,
I have a dataset consisting of 40 individuals, with 20 individuals from each of two putative populations. Preliminary analyses suggest that these two populations are not significantly different from each other. However, I am interested in determining whether it is still possible to identify any adaptive loci despite the lack of population structure. Would it make sense to pursue this analysis, and if so, can this be achieved using PCAadapt?
Thank you!