Closed Lin012345 closed 5 months ago
I'm not sure what I am looking at. Could you give more explanation please?
I apologize for not providing detailed information. I have 679,602 loci, and after calculating the p-values with pcadapt, I found that there are a large number of p-values that were not calculated. plot_m_clean<- na.omit(plot_m)
removed 299,224 rows with NA values.
With the most sincere thanks
and the code I used
setwd("D:/data/10_pcadapt/all")
library(pcadapt)
library(data.table)
path_to_file <- "3group.bed"
filename <- read.pcadapt(path_to_file, type = "bed")
poplist.names <- c(rep("1", 104),rep("2", 100),rep("3", 31))
print(poplist.names)
y <- pcadapt(input = filename, K = 20)
plot(y, option = "scores",i=1,j=2,pop = poplist.names)
plot(y, option = "screeplot")
y <- pcadapt(input = filename, K = 2)
plot(y, option = "scores", pop = poplist.names)
plot(y,option = "manhattan")
par(mfrow = c(2, 2))
for (i in 1:2)
plot(y$loadings[, i], pch = 19, cex = .3, ylab = paste0("Loadings PC", i))
library(qvalue)
y$qvalue <- qvalue(y$pvalues)
hist(y$pvalues, xlab = "p-values", main = NULL, breaks = 50, col = "#FFD369")
plot(y, option = "qqplot")
plot(y, option = "stat.distribution")
library(qvalue)
qval <- qvalue(y$pvalues)$qvalues
alpha <- 0.01
outliers <- which(qval < alpha)
length(outliers)
significant_markers <- logical(length(pca$pvalues))
significant_markers[outliers] <- TRUE
y$significant_markers <- significant_markers
pos <- fread("3group.bim", header = F)
plot_m <- data.frame("SNP" = pos$V2, "CHR" = pos$V1, "BP" = pos$V4, "pvalues" = y$pvalues)
n_before <- nrow(plot_m)
plot_m_clean <- na.omit(plot_m)
n_after <- nrow(plot_m_clean)
cat("Removed", n_before - n_after, "rows with NA values\n")
write.table(plot_m_clean, file = "3group.txt", sep = "\t", quote = FALSE, row.names = FALSE, col.names = TRUE)
The ones for which you do not get p-values should be the ones with a MAF that is too low.
The ones for which you do not get p-values should be the ones with a MAF that is too low.
After we use the MAF>0.05 standard for filtering, the SNP of 46W still has 11W of NA, how can we solve this situation?
Could you send me the resulting pcadapt object (as an RDS file).
Could you send me the resulting pcadapt object (as an RDS file).
Thank you very much for your reply. I have discovered that I cannot upload an RDS file here. so I've compressed it instead, but I am not sure if that is what you want. If there is anything else you need, please let me know. Again, thank you very much for your patience. group_maf.zip
Can't download it for some reason. Send me the RDS file to florian.prive.21@gmail.com.
> res <- readRDS("tmp-data/group_maf.rds")
> str(res)
List of 11
$ scores : num [1:235, 1:2] -0.00335 -0.04187 -0.04297 -0.03627 -0.03708 ...
$ singular.values: num [1:2] 0.1026 0.0817
$ loadings : num [1:676053, 1:2] 0.00155 -0.000789 NA NA 0.001033 ...
$ zscores : num [1:676053, 1:2] 1.569 -0.75 NA NA 0.993 ...
$ af : num [1:676053] 0.94 0.756 0.972 1 0.915 ...
$ maf : num [1:676053] 0.0596 0.2436 0.0277 0 0.0851 ...
$ chi2.stat : num [1:676053] 4.468 0.989 NA NA 1.569 ...
$ stat : num [1:676053] 6.06 1.34 NA NA 2.13 ...
$ gif : num 1.36
$ pvalues : num [1:676053] 0.107 0.61 NA NA 0.456 ...
$ pass : int [1:379265] 1 2 5 8 12 15 16 20 21 22 ...
- attr(*, "K")= num 2
- attr(*, "method")= chr "mahalanobis"
- attr(*, "min.maf")= num 0.05
- attr(*, "class")= chr "pcadapt"
From what I can see, e.g. variants 3 & 4 have indeed been removed due to the MAF threshold.
> res <- readRDS("tmp-data/group_maf.rds") > str(res) List of 11 $ scores : num [1:235, 1:2] -0.00335 -0.04187 -0.04297 -0.03627 -0.03708 ... $ singular.values: num [1:2] 0.1026 0.0817 $ loadings : num [1:676053, 1:2] 0.00155 -0.000789 NA NA 0.001033 ... $ zscores : num [1:676053, 1:2] 1.569 -0.75 NA NA 0.993 ... $ af : num [1:676053] 0.94 0.756 0.972 1 0.915 ... $ maf : num [1:676053] 0.0596 0.2436 0.0277 0 0.0851 ... $ chi2.stat : num [1:676053] 4.468 0.989 NA NA 1.569 ... $ stat : num [1:676053] 6.06 1.34 NA NA 2.13 ... $ gif : num 1.36 $ pvalues : num [1:676053] 0.107 0.61 NA NA 0.456 ... $ pass : int [1:379265] 1 2 5 8 12 15 16 20 21 22 ... - attr(*, "K")= num 2 - attr(*, "method")= chr "mahalanobis" - attr(*, "min.maf")= num 0.05 - attr(*, "class")= chr "pcadapt"
From what I can see, e.g. variants 3 & 4 have indeed been removed due to the MAF threshold.
I understand, your suggestions are very helpful, thank you very much.
Dear all, I am using pcadapt to identify loci under natural selection in the population, but there are a large number of NA values. I would like to know what might have caused this, whether it is related to the quality of my data, or if there are other aspects that I have overlooked. Best regards