Closed kbcn closed 6 years ago
First guess: have you SNPs with too many missing data?
Moreover, as stated in the documentation, we do not fully support VCF format. We recommend you to use PLINK to do conversion to bed/bim/fam (and basic quality control at the same time) before using package {pcadapt}.
Thanks for your help! I went back and used plink to apply pretty stringent filters and to convert to a ped file. It reads in fine, but I'm still getting the same error message.
Can you tell us more about your data and the commands you tried?
Do you have the latest version of this package (GitHub) and package {RSpectra} (CRAN)?
Could you send us a reproducible example that generates the error message?
Hi,
I have the same error and I have filtered out missing data:
I tried attaching the file sel.bed but github doesn't support that file.
My code:
pca_genotype <- read.pcadapt("Tipulid/sel.bed", type = "bed")
K <- 10
x <- pcadapt(pca_genotype, K = 10)
Error:
Error in svds_real_gen(A, k, nu, nv, opts, mattype = "function", extra_args = list(Atrans = Atrans, : TridiagEigen: eigen decomposition failed
sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X El Capitan 10.11.6
locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2.2 qvalue_2.6.0 pcadapt_4.0.3 tidyr_0.8.1 poppr_2.8.1
[6] ggplot2_3.0.0 reshape2_1.4.3 dplyr_0.7.6 pegas_0.11 ape_5.1
[11] hierfstat_0.04-22 adegenet_2.1.1 ade4_1.7-13
loaded via a namespace (and not attached):
[1] nlme_3.1-128 robust_0.4-18 fit.models_0.5-14 gmodels_2.18.1 httr_1.3.1
[6] tools_3.3.2 polysat_1.7-3 R6_2.2.2 vegan_2.5-2 spData_0.2.9.3
[11] lazyeval_0.2.1 mgcv_1.8-24 colorspace_1.3-2 permute_0.9-4 withr_2.1.2
[16] sp_1.3-1 tidyselect_0.2.4 phangorn_2.4.0 expm_0.999-2 plotly_4.8.0
[21] labeling_0.3 scales_1.0.0 DEoptimR_1.0-8 mvtnorm_1.0-8 robustbase_0.93-2
[26] quadprog_1.5-5 stringr_1.3.1 digest_0.6.17 rrcov_1.4-4 pkgconfig_2.0.2
[31] htmltools_0.3.6 htmlwidgets_1.2 rlang_0.2.2 rstudioapi_0.7 shiny_1.1.0
[36] bindr_0.1.1 jsonlite_1.5 gtools_3.8.1 spdep_0.7-8 magrittr_1.5
[41] Matrix_1.2-14 Rcpp_0.12.18.1 munsell_0.5.0 stringi_1.2.4 MASS_7.3-50
[46] plyr_1.8.4 pinfsc50_1.1.0 grid_3.3.2 parallel_3.3.2 gdata_2.18.0
[51] promises_1.0.1 crayon_1.3.4 deldir_0.1-15 lattice_0.20-35 splines_3.3.2
[56] pillar_1.3.0 igraph_1.0.1 boot_1.3-20 seqinr_3.4-5 stats4_3.3.2
[61] vcfR_1.8.0 fastmatch_1.1-0 LearnBayes_2.15.1 glue_1.3.0 data.table_1.11.4
[66] httpuv_1.4.5 gtable_0.2.0 purrr_0.2.5 assertthat_0.2.0 mime_0.5
[71] xtable_1.8-3 RSpectra_0.13-1 coda_0.19-1 later_0.7.4 pcaPP_1.9-73
[76] viridisLite_0.3.0 tibble_1.4.2 cluster_2.0.7-1
Can you put your bedfile in a dropbox and link to it please? It would be really hard to help you unless we have the actual data to reproduce your problem.
Hi, here is the link:
https://www.dropbox.com/sh/iucdrx5newksr0h/AABr8Q-Id5CcjG3Fqht5KAjqa?dl=0
Ok ill try that. When I tried filtering using Plink with a more strict filter than --geno 0.4 it tells me that I have no snps left. What is the maximum percentage of missing data that pcadapt accepts?
@lmguzman
The problem is that you have one SNP (N° 1397) with only 1
s or NA
s.
So, basically you have a variable with no variation at all, which is not handle by the algorithm we use.
Usually, there are SNPs with only 0
s or 2
s, and this case is handled because we filter on the MAF.
Yet, here you have a MAF of 0.5.
You could filter this SNP out (e.g. with PLINK) and it would solve the problem I think.
@mblumuga Would it make sense to filter also on large MAF (maybe > 0.45?).
@lmguzman Forget the previous comment that I deleted. I was talking about another package that doesn't handle missing values. 40% of NAs for some SNPs should be okay for pcadapt I think.
@privefl
Ok, but I should still filter out the the SNPs that have only 1
s or NA
s right?
Yes
Hi, I deleted this SNP, but it still gives me the error, and I checked that none of the SNPs have only 1
s or NA
s. Should I do some other filtering?
I even tried filtering so there are only 30% NAs. But Im still getting this error.
I managed to use the functions successfully with another data set, so I don't think it's a problem of the packages/ environment.
Please give me the code you used. I'll try too.
I used this to check that there weren't any SNP's with only 1s
and NAs
:
bed <- "Tipulid/fil_cor_sel.bed"
rds <- snp_readBed(bed)
data <- snp_attach(rds)
G <- data$genotypes
which(colSums(big_counts(G)[c(1,3),]) == 0)
integer(0)
pca_genotype <- read.pcadapt("Tipulid/fil_cor_sel.bed", type = "bed")
K <- 10
x <- pcadapt(pca_genotype, K = 10)
Error in svds_real_gen(A, k, nu, nv, opts, mattype = "function", extra_args = list(Atrans = Atrans, : TridiagEigen: eigen decomposition failed
I added the new filtered file to the dropbox, here is the link again:
https://www.dropbox.com/s/75p0l0qf5padrlb/fil_cor_sel.bed?dl=0
Okay, I found the other problem.
You have one individual with only NA
s.
The code I used:
bigsnpr:::write.table2(read.table("tmp-data/sel.bim")[1397, ], "mysnps.txt")
system(paste(bigsnpr::download_plink(),
"--bfile tmp-data/sel",
"--mind 0.6",
"--exclude mysnps.txt",
"--make-bed"))
bed <- "plink.bed"
bed <- read.pcadapt(bed, type = "bed")
mat <- bed2matrix(bed)
summary(colMeans(is.na(mat)))
summary(rowMeans(is.na(mat)))
summary(apply(mat, 2, sd, na.rm = TRUE))
summary(apply(mat, 1, sd, na.rm = TRUE))
pcadapt(bed, K = 10)
Great, thank you so much!
@privefl It would make sense to remove SNP that are not SNPs, ie genotype values that do not vary (such as SNP with only 1s or NAs)
I can't reproduce the problem with a SNP with only 1s. Seems like it is okay, and the only problem is when one SNP or one individual has only NAs, which should be handle by proper quality control with PLINK.
I'm now returning a more explicit error than the TridiagEigen
error of {RSpectra}.
Hello,
I'm new to this package and am having some trouble. I followed the vignette and read in my vcf file (output from stacks) using the "read.pcadapt" fxn. I get a summary of the number of individuals, loci, lines, and columns detected that is what I would expect, so I'm assuming this worked.
Error in svds_real_gen(A, k, nu, nv, opts, mattype = "function", extra_args = list(Atrans = Atrans, : TridiagEigen: eigen decomposition failed
Can you provide any insight/help?
Thanks, Kim