bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
37 stars 10 forks source link

Adding fake variants when you have too few. #56

Closed privefl closed 3 years ago

privefl commented 4 years ago

pcadapt needs many "null" variants to compute the mahalanobis distance and detect outliers.

If you have too few variants, you can try adding 50,000 fake (null) ones:

mat <- pcadapt::bed2matrix(obj_read_pcadapt)
# Verif
hist(colMeans(mat, na.rm = TRUE))
hist(colMeans(is.na(mat)))
hist(rowMeans(is.na(mat)))

# adding some null SNPs
N <- nrow(mat)
M <- 50e3
mat_null <- sapply(runif(M, min = 0.05, max = 0.5), function(af) {
  rbinom(N, size = 2, prob = af)
})
mat2 <- cbind(mat, mat_null)
obj_read_pcadapt <- read.pcadapt(mat2, type = "lfmm")

You'll get a more uniform p-value histogram. (It's even better with 200,000)