bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
39 stars 10 forks source link

Error during pcadapt fxn #15

Closed kbcn closed 6 years ago

kbcn commented 6 years ago

Hello,

I'm new to this package and am having some trouble. I followed the vignette and read in my vcf file (output from stacks) using the "read.pcadapt" fxn. I get a summary of the number of individuals, loci, lines, and columns detected that is what I would expect, so I'm assuming this worked.

When I use the pcadapt fxn on my read-in data file, I get the following message:

Error in svds_real_gen(A, k, nu, nv, opts, mattype = "function", extra_args = list(Atrans = Atrans, : TridiagEigen: eigen decomposition failed

Can you provide any insight/help?

Thanks, Kim

privefl commented 6 years ago

First guess: have you SNPs with too many missing data?

Moreover, as stated in the documentation, we do not fully support VCF format. We recommend you to use PLINK to do conversion to bed/bim/fam (and basic quality control at the same time) before using package {pcadapt}.

kbcn commented 6 years ago

Thanks for your help! I went back and used plink to apply pretty stringent filters and to convert to a ped file. It reads in fine, but I'm still getting the same error message.

privefl commented 6 years ago

Can you tell us more about your data and the commands you tried?

Do you have the latest version of this package (GitHub) and package {RSpectra} (CRAN)?

mblumuga commented 6 years ago

Could you send us a reproducible example that generates the error message?

lmguzman commented 6 years ago

Hi,

I have the same error and I have filtered out missing data:

I tried attaching the file sel.bed but github doesn't support that file.

My code:

pca_genotype <- read.pcadapt("Tipulid/sel.bed", type = "bed")

K <- 10 x <- pcadapt(pca_genotype, K = 10)

Error:

Error in svds_real_gen(A, k, nu, nv, opts, mattype = "function", extra_args = list(Atrans = Atrans, : TridiagEigen: eigen decomposition failed

sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X El Capitan 10.11.6

locale: [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] bindrcpp_0.2.2 qvalue_2.6.0 pcadapt_4.0.3 tidyr_0.8.1 poppr_2.8.1
[6] ggplot2_3.0.0 reshape2_1.4.3 dplyr_0.7.6 pegas_0.11 ape_5.1
[11] hierfstat_0.04-22 adegenet_2.1.1 ade4_1.7-13

loaded via a namespace (and not attached): [1] nlme_3.1-128 robust_0.4-18 fit.models_0.5-14 gmodels_2.18.1 httr_1.3.1
[6] tools_3.3.2 polysat_1.7-3 R6_2.2.2 vegan_2.5-2 spData_0.2.9.3
[11] lazyeval_0.2.1 mgcv_1.8-24 colorspace_1.3-2 permute_0.9-4 withr_2.1.2
[16] sp_1.3-1 tidyselect_0.2.4 phangorn_2.4.0 expm_0.999-2 plotly_4.8.0
[21] labeling_0.3 scales_1.0.0 DEoptimR_1.0-8 mvtnorm_1.0-8 robustbase_0.93-2 [26] quadprog_1.5-5 stringr_1.3.1 digest_0.6.17 rrcov_1.4-4 pkgconfig_2.0.2
[31] htmltools_0.3.6 htmlwidgets_1.2 rlang_0.2.2 rstudioapi_0.7 shiny_1.1.0
[36] bindr_0.1.1 jsonlite_1.5 gtools_3.8.1 spdep_0.7-8 magrittr_1.5
[41] Matrix_1.2-14 Rcpp_0.12.18.1 munsell_0.5.0 stringi_1.2.4 MASS_7.3-50
[46] plyr_1.8.4 pinfsc50_1.1.0 grid_3.3.2 parallel_3.3.2 gdata_2.18.0
[51] promises_1.0.1 crayon_1.3.4 deldir_0.1-15 lattice_0.20-35 splines_3.3.2
[56] pillar_1.3.0 igraph_1.0.1 boot_1.3-20 seqinr_3.4-5 stats4_3.3.2
[61] vcfR_1.8.0 fastmatch_1.1-0 LearnBayes_2.15.1 glue_1.3.0 data.table_1.11.4 [66] httpuv_1.4.5 gtable_0.2.0 purrr_0.2.5 assertthat_0.2.0 mime_0.5
[71] xtable_1.8-3 RSpectra_0.13-1 coda_0.19-1 later_0.7.4 pcaPP_1.9-73
[76] viridisLite_0.3.0 tibble_1.4.2 cluster_2.0.7-1

privefl commented 6 years ago

Can you put your bedfile in a dropbox and link to it please? It would be really hard to help you unless we have the actual data to reproduce your problem.

lmguzman commented 6 years ago

Hi, here is the link:

https://www.dropbox.com/sh/iucdrx5newksr0h/AABr8Q-Id5CcjG3Fqht5KAjqa?dl=0

lmguzman commented 6 years ago

Ok ill try that. When I tried filtering using Plink with a more strict filter than --geno 0.4 it tells me that I have no snps left. What is the maximum percentage of missing data that pcadapt accepts?

privefl commented 6 years ago

@lmguzman The problem is that you have one SNP (N° 1397) with only 1s or NAs. So, basically you have a variable with no variation at all, which is not handle by the algorithm we use. Usually, there are SNPs with only 0s or 2s, and this case is handled because we filter on the MAF. Yet, here you have a MAF of 0.5.

You could filter this SNP out (e.g. with PLINK) and it would solve the problem I think.

@mblumuga Would it make sense to filter also on large MAF (maybe > 0.45?).

privefl commented 6 years ago

@lmguzman Forget the previous comment that I deleted. I was talking about another package that doesn't handle missing values. 40% of NAs for some SNPs should be okay for pcadapt I think.

lmguzman commented 6 years ago

@privefl Ok, but I should still filter out the the SNPs that have only 1s or NAs right?

privefl commented 6 years ago

Yes

lmguzman commented 6 years ago

Hi, I deleted this SNP, but it still gives me the error, and I checked that none of the SNPs have only 1s or NAs. Should I do some other filtering? I even tried filtering so there are only 30% NAs. But Im still getting this error. I managed to use the functions successfully with another data set, so I don't think it's a problem of the packages/ environment.

privefl commented 6 years ago

Please give me the code you used. I'll try too.

lmguzman commented 6 years ago

I used this to check that there weren't any SNP's with only 1s and NAs:

bed <- "Tipulid/fil_cor_sel.bed" rds <- snp_readBed(bed) data <- snp_attach(rds) G <- data$genotypes which(colSums(big_counts(G)[c(1,3),]) == 0) integer(0)

pca_genotype <- read.pcadapt("Tipulid/fil_cor_sel.bed", type = "bed") K <- 10 x <- pcadapt(pca_genotype, K = 10)

Error in svds_real_gen(A, k, nu, nv, opts, mattype = "function", extra_args = list(Atrans = Atrans, : TridiagEigen: eigen decomposition failed

I added the new filtered file to the dropbox, here is the link again:

https://www.dropbox.com/s/75p0l0qf5padrlb/fil_cor_sel.bed?dl=0

privefl commented 6 years ago

Okay, I found the other problem. You have one individual with only NAs.

The code I used:

bigsnpr:::write.table2(read.table("tmp-data/sel.bim")[1397, ], "mysnps.txt")
system(paste(bigsnpr::download_plink(), 
             "--bfile tmp-data/sel",
             "--mind 0.6",
             "--exclude mysnps.txt",
             "--make-bed"))

bed <- "plink.bed"
bed <- read.pcadapt(bed, type = "bed")

mat <- bed2matrix(bed)
summary(colMeans(is.na(mat)))
summary(rowMeans(is.na(mat)))
summary(apply(mat, 2, sd, na.rm = TRUE))
summary(apply(mat, 1, sd, na.rm = TRUE))

pcadapt(bed, K = 10)
lmguzman commented 6 years ago

Great, thank you so much!

mblumuga commented 6 years ago

@privefl It would make sense to remove SNP that are not SNPs, ie genotype values that do not vary (such as SNP with only 1s or NAs)

privefl commented 6 years ago

I can't reproduce the problem with a SNP with only 1s. Seems like it is okay, and the only problem is when one SNP or one individual has only NAs, which should be handle by proper quality control with PLINK.

privefl commented 6 years ago

I'm now returning a more explicit error than the TridiagEigen error of {RSpectra}.