probelm with missing values

cyadrogarcia commented 2 years ago

When trying to run the pcadapt function with the tutorial I receive the following error "Error: Can't compute SVD. Are there SNPs or individuals with missing values only? You should use PLINK for proper data quality control."

Apparently is a problem related to my RStudio installation, since in the computers of other colleagues it works. Do you know what migth be causing the problem

privefl commented 2 years ago

You are saying that they do not get this error, and get a proper result, on the same data? Do you have the same package version as they do?

privefl commented 2 years ago

Any update on this?

jcaccavo commented 1 year ago

Hi there,

I also received this error when trying to run pcadapt on a particular dataset. When running pcadapt using the same R script with 3 other datasets that derived from the same samples, I had no issue. The code producing the error is as follows:

path_to_file <- ("/Users/JMAC/Library/CloudStorage/Dropbox/Research/Humboldt/CCGA_full_sequencing/WG_outlier_analysis/WG_pcadapt/downsampled_10X/10x_TOA_only_filtered_SNPs_all_2.bed")
filename <- read.pcadapt(path_to_file, type = "bed")

x <- pcadapt(input=filename, K=20)

Error: Can't compute SVD.
Are there SNPs or individuals with missing values only?
You should use PLINK for proper data quality control.

I wonder if the issue might be the sample number, as mentioned in issue #66. The dataset in question has an n of 14.

That said, the other 3 datasets that ran successfully with pcadapt, have and n of 41, 39 and 24.

I produced the input .bed file using plink2, as I did for the other 3 datasets.

At first I used the following code: plink2 --vcf 10x_TOA_only_filtered_SNPs_all.vcf --make-bed --allow-extra-chr --out 10x_TOA_only_filtered_SNPs_all

Then, based on the feedback in issue #66, I included --mind 0.4 and --geno 0.5 parameters: plink2 --vcf 10x_TOA_only_filtered_SNPs_all.vcf --make-bed --allow-extra-chr --mind 0.5 --geno 0.5 --out 10x_TOA_only_filtered_SNPs_all_2

Both resulting .bed files produced the same error in pcadapt.

To see if you could potentially reproduce the error I'm providing the following files: Dataset n = 14: .vcf file (used as input to plink2), and .bed file (produced in plink2, used as input to pcadapt) R script for n = 14 dataset R session info:

─ Session info ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.2 (2022-10-31)
 os       macOS Monterey 12.6
 system   x86_64, darwin17.0
 ui       RStudio
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       Europe/Paris
 date     2023-08-29
 rstudio  2023.03.1+446 Cherry Blossom (desktop)
 pandoc   NA

─ Packages ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package     * version  date (UTC) lib source
 ade4        * 1.7-22   2023-02-06 [1] CRAN (R 4.2.0)
 adegenet    * 2.1.10   2023-01-26 [1] CRAN (R 4.2.2)
 ape           5.7-1    2023-03-13 [1] CRAN (R 4.2.0)
 cachem        1.0.7    2023-02-24 [1] CRAN (R 4.2.0)
 callr         3.7.3    2022-11-02 [1] CRAN (R 4.2.0)
 cli           3.6.1    2023-03-23 [1] CRAN (R 4.2.0)
 cluster       2.1.4    2022-08-22 [1] CRAN (R 4.2.2)
 colorspace    2.1-0    2023-01-23 [1] CRAN (R 4.2.0)
 crayon        1.5.2    2022-09-29 [1] CRAN (R 4.2.0)
 data.table    1.14.8   2023-02-17 [1] CRAN (R 4.2.0)
 devtools      2.4.5    2022-10-11 [1] CRAN (R 4.2.0)
 digest        0.6.31   2022-12-11 [1] CRAN (R 4.2.0)
 dplyr         1.1.1    2023-03-22 [1] CRAN (R 4.2.0)
 ellipsis      0.3.2    2021-04-29 [1] CRAN (R 4.2.0)
 fansi         1.0.4    2023-01-22 [1] CRAN (R 4.2.0)
 fastmap       1.1.1    2023-02-24 [1] CRAN (R 4.2.0)
 fs            1.6.1    2023-02-06 [1] CRAN (R 4.2.0)
 generics      0.1.3    2022-07-05 [1] CRAN (R 4.2.0)
 ggplot2       3.4.2    2023-04-03 [1] CRAN (R 4.2.0)
 glue          1.6.2    2022-02-24 [1] CRAN (R 4.2.0)
 gtable        0.3.3    2023-03-21 [1] CRAN (R 4.2.0)
 hms           1.1.3    2023-03-21 [1] CRAN (R 4.2.0)
 htmltools     0.5.5    2023-03-23 [1] CRAN (R 4.2.0)
 htmlwidgets   1.6.2    2023-03-17 [1] CRAN (R 4.2.0)
 httpuv        1.6.9    2023-02-14 [1] CRAN (R 4.2.0)
 igraph        1.4.2    2023-04-07 [1] CRAN (R 4.2.0)
 later         1.3.0    2021-08-18 [1] CRAN (R 4.2.0)
 lattice       0.21-8   2023-04-05 [1] CRAN (R 4.2.0)
 lifecycle     1.0.3    2022-10-07 [1] CRAN (R 4.2.0)
 magrittr      2.0.3    2022-03-30 [1] CRAN (R 4.2.0)
 MASS          7.3-58.3 2023-03-07 [1] CRAN (R 4.2.0)
 Matrix        1.5-4    2023-04-04 [1] CRAN (R 4.2.0)
 memoise       2.0.1    2021-11-26 [1] CRAN (R 4.2.0)
 memuse        4.2-3    2023-01-24 [1] CRAN (R 4.2.2)
 mgcv          1.8-42   2023-03-02 [1] CRAN (R 4.2.0)
 mime          0.12     2021-09-28 [1] CRAN (R 4.2.0)
 miniUI        0.1.1.1  2018-05-18 [1] CRAN (R 4.2.0)
 munsell       0.5.0    2018-06-12 [1] CRAN (R 4.2.0)
 nlme          3.1-162  2023-01-31 [1] CRAN (R 4.2.0)
 OutFLANK    * 0.2      2023-01-20 [1] Github (whitlock/OutFLANK@e502e82)
 pcadapt     * 4.3.3    2020-05-05 [1] CRAN (R 4.2.0)
 permute       0.9-7    2022-01-27 [1] CRAN (R 4.2.0)
 pillar        1.9.0    2023-03-22 [1] CRAN (R 4.2.0)
 pinfsc50      1.2.0    2020-06-03 [1] CRAN (R 4.2.0)
 pkgbuild      1.4.0    2022-11-27 [1] CRAN (R 4.2.0)
 pkgconfig     2.0.3    2019-09-22 [1] CRAN (R 4.2.0)
 pkgload       1.3.2    2022-11-16 [1] CRAN (R 4.2.0)
 plyr          1.8.8    2022-11-11 [1] CRAN (R 4.2.0)
 prettyunits   1.1.1    2020-01-24 [1] CRAN (R 4.2.0)
 processx      3.8.1    2023-04-18 [1] CRAN (R 4.2.2)
 profvis       0.3.7    2020-11-02 [1] CRAN (R 4.2.0)
 promises      1.2.0.1  2021-02-11 [1] CRAN (R 4.2.0)
 ps            1.7.5    2023-04-18 [1] CRAN (R 4.2.2)
 purrr         1.0.1    2023-01-10 [1] CRAN (R 4.2.0)
 qvalue      * 2.30.0   2022-11-01 [1] Bioconductor
 R6            2.5.1    2021-08-19 [1] CRAN (R 4.2.0)
 Rcpp          1.0.10   2023-01-22 [1] CRAN (R 4.2.0)
 readr         2.1.4    2023-02-10 [1] CRAN (R 4.2.0)
 remotes       2.4.2    2021-11-30 [1] CRAN (R 4.2.0)
 reshape2      1.4.4    2020-04-09 [1] CRAN (R 4.2.0)
 rlang         1.1.0    2023-03-14 [1] CRAN (R 4.2.0)
 RSpectra      0.16-1   2022-04-24 [1] CRAN (R 4.2.0)
 rstudioapi    0.14     2022-08-22 [1] CRAN (R 4.2.0)
 scales        1.2.1    2022-08-20 [1] CRAN (R 4.2.0)
 seqinr        4.2-30   2023-04-05 [1] CRAN (R 4.2.0)
 sessioninfo   1.2.2    2021-12-06 [1] CRAN (R 4.2.0)
 shiny         1.7.4    2022-12-15 [1] CRAN (R 4.2.0)
 stringi       1.7.12   2023-01-11 [1] CRAN (R 4.2.0)
 stringr       1.5.0    2022-12-02 [1] CRAN (R 4.2.0)
 tibble        3.2.1    2023-03-20 [1] CRAN (R 4.2.0)
 tidyselect    1.2.0    2022-10-10 [1] CRAN (R 4.2.0)
 tzdb          0.3.0    2022-03-28 [1] CRAN (R 4.2.0)
 urlchecker    1.0.1    2021-11-30 [1] CRAN (R 4.2.0)
 usethis       2.1.6    2022-05-25 [1] CRAN (R 4.2.0)
 utf8          1.2.3    2023-01-31 [1] CRAN (R 4.2.2)
 vcfR        * 1.14.0   2023-02-10 [1] CRAN (R 4.2.0)
 vctrs         0.6.1    2023-03-22 [1] CRAN (R 4.2.0)
 vegan         2.6-4    2022-10-11 [1] CRAN (R 4.2.0)
 viridisLite   0.4.1    2022-08-22 [1] CRAN (R 4.2.0)
 xtable        1.8-4    2019-04-21 [1] CRAN (R 4.2.0)

 [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library

────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

If it might help, for comparison, the other 3 datasets that ran successfully in pcadapt, are also available via the following links. The difference between the dataset in question (n = 14) and these (n = 41, 39, and 24 respectively), is that the dataset in question was downsampled so that all samples have the same coverage, in this case, 10x. The other 3 datasets were either not downsampled (n = 41), downsampled to 2x (n = 39), or downsampled to 5x (n = 24). The variation in the number of samples per dataset is because the coverage among samples ranged from 1x - 27x, so not all samples had a high enough coverage to be downsampled to the appropriate coverage level.

Dataset n = 41 (not downsampled, coverage ranges from 1x - 27x): .vcf file, .bed file, R script for pcadapt Dataset n = 39 (downsampled to 2x coverage): .vcf file, .bed file, R script for pcadapt Dataset n = 24 (downsampled to 5x coverage): .vcf file, .bed file, R script for pcadapt

I'm happy to provide further information as needed.

Thanks in advance for your feedback.

Best, Jilda

privefl commented 1 year ago

@jcaccavo Thanks for the detailed description of the issue, and providing data to reproduce it.

I'll look into it.

privefl commented 1 year ago

The issue is that you have K > N. This is not possible to get more PCs than the number of individuals.

I've pushed a new version of the package that should provide a more helpful error message in that case.

jcaccavo commented 1 year ago

Ugh duh! I'm sorry to have bothered you with this. I was focused on the fact that, after running the diagnostic plots, I was only testing for values of k 1 - 11, which make sense for my dataset. But indeed, for the diagnostic scree and score plots, I was using a standard value of k = 20 to see the impact of k value on the datasets generally; of course, with my dataset with an n of 14, this was causing issues.. Thanks for helping me see that!

bcm-uga / pcadapt

probelm with missing values #77