UW-GAC / GENESIS

GENetic EStimation and Inference in Structured samples (GENESIS): Statistical methods for analyzing genetic data from samples with population structure and/or relatedness
https://bioconductor.org/packages/GENESIS
34 stars 13 forks source link

HOW to FIX - PCAir excludes all my SNPs because there is no chromosome information #104

Closed GabryS3 closed 10 months ago

GabryS3 commented 10 months ago

Hi, I am just trying now to run PCAir and PCRelate. However, I ran into a problem that I am not sure how to fix.

When I run pcair(), I get the following error (highlighted in bold):

<Using kinobj and divobj to partition samples into unrelated and related sets Working with 425 samples Identifying relatives for each sample using kinship threshold 0.0441941738241592 Identifying pairs of divergent samples using divergence threshold -0.0441941738241592 Partitioning samples into unrelated and related sets... Unrelated Set: 375 Samples Related Set: 50 Samples Performing PCA on the Unrelated Set... Principal Component Analysis (PCA) on genotypes: Excluding 8,685 SNPs on non-autosomes Error in .InitFile2(cmd = "Principal Component Analysis (PCA) on genotypes:", : There is no SNP!>

The code I used is below:

GDS file obtained from a genlight object (dartR package) with the dartR::gl2gds" function. Then GDS file converted into gdsobj with the code below (from GDWAS tools package):

mydataset_geno <- GdsGenotypeReader(filename = "C:/documents/mydataset_GDS.gds")

class(mydataset_geno) # [1] "GdsGenotypeReader" attr(,"package") [1] "GWASTools"

object without annotation

mydataset_genoData <- GenotypeData(mydataset_geno)

str(mydataset_genoData)

_Formal class 'GenotypeData' [package "GWASTools"] with 3 slots ..@ data :Formal class 'GdsGenotypeReader' [package "GWASTools"] with 15 slots .. .. ..@ snpIDvar : chr "snp.id" .. .. ..@ chromosomeVar: chr "snp.chromosome" .. .. ..@ positionVar : chr "snp.position" .. .. ..@ scanIDvar : chr "sample.id" .. .. ..@ genotypeVar : chr "genotype" .. .. ..@ alleleVar : chr "snp.allele" .. .. ..@ autosomeCode : int [1:22] 1 2 3 4 5 6 7 8 9 10 ... .. .. ..@ XchromCode : int 23 .. .. ..@ YchromCode : int 25 .. .. ..@ XYchromCode : int 24 .. .. ..@ MchromCode : int 26 .. .. ..@ genotypeDim : chr "snp,scan" .. .. ..@ missingValue : int 3 .. .. ..@ filename : chr "C:/documents/mydataset"| truncated .. .. ..@ handler :List of 5 .. .. .. ..$ filename: chr "C:/documents/mydataset_GDS.gdsLocusC"| truncated .. .. .. ..$ id : int 1 .. .. .. ..$ ptr : .. .. .. ..$ root : 'gdsn.class' raw [1:20] 08 00 00 00 ... .. .. .. ..$ readonly: logi TRUE .. .. .. ..- attr(*, "class")= chr "gds.class" ..@ snpAnnot : NULL ..@ scanAnnot: NULL

PCAir

mydataset_pca_PCAir <- pcair( mydataset_genoData, kinobj = mydataset_kingMat, kin.thresh=2^(-9/2), # = 0.04 = 3rd degree kinship threshold, which corresponds to first cousins (actually 1stcousins kinship= 1/16 - Weir 2006 & 0.06>0.04) divobj = mydataset_kingMat, div.thresh=-2^(-9/2))

Using kinobj and divobj to partition samples into unrelated and related sets Working with 425 samples Identifying relatives for each sample using kinship threshold 0.0441941738241592 Identifying pairs of divergent samples using divergence threshold -0.0441941738241592 Partitioning samples into unrelated and related sets... Unrelated Set: 375 Samples Related Set: 50 Samples Performing PCA on the Unrelated Set... Principal Component Analysis (PCA) on genotypes: Excluding 8,685 SNPs on non-autosomes Error in .InitFile2(cmd = "Principal Component Analysis (PCA) on genotypes:", : There is no SNP!

How to make sure that all SNPs are used in PCAir and not only Autosomes (as my SNPs have no chromosome position being a non-model species with no genome)?

Thank you for any help! Gabriella

sessionInfo( ): R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_Australia.utf8 LC_CTYPE=English_Australia.utf8 LC_MONETARY=English_Australia.utf8 [4] LC_NUMERIC=C LC_TIME=English_Australia.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] GENESIS_2.26.0 GWASTools_1.42.1 Biobase_2.56.0 BiocGenerics_0.42.0 InRelate_0.1.0 [6] remotes_2.4.2 fstcore_0.9.12 radiator_1.2.8 OutFLANK_0.2 LEA_3.8.0 [11] vcfR_1.12.0 stockR_1.0.74 qvalue_2.28.0 SNPRelate_1.30.1 dartR_2.0.4 [16] adegenet_2.1.7 ade4_1.7-19 plotrix_3.8-2 forcats_0.5.1 stringr_1.4.1 [21] dplyr_1.0.9 purrr_0.3.4 readr_2.1.2 tidyr_1.2.0 tibble_3.1.7 [26] tidyverse_1.3.1 ggplot2_3.4.0 plyr_1.8.7 SeqArray_1.36.3 gdsfmt_1.32.0

loaded via a namespace (and not attached): [1] utf8_1.2.2 R.utils_2.12.0 tidyselect_1.1.2 RSQLite_2.3.2 [5] BiocParallel_1.30.4 grid_4.2.1 combinat_0.0-8 StAMPP_1.6.3 [9] GWASExactHW_1.01 devtools_2.4.3 munsell_0.5.0 codetools_0.2-18 [13] withr_2.5.0 colorspace_2.0-3 fst_0.9.8 pegas_1.1 [17] knitr_1.39 rstudioapi_0.13 stats4_4.2.1 labeling_0.4.2 [21] RgoogleMaps_1.4.5.3 logistf_1.26.0 GenomeInfoDbData_1.2.8 bit64_4.0.5 [25] farver_2.1.1 gap.datasets_0.0.5 rprojroot_2.0.3 vctrs_0.5.1 [29] generics_0.1.3 xfun_0.31 R6_2.5.1 doParallel_1.0.17 [33] GenomeInfoDb_1.32.2 fields_14.0 bitops_1.0-7 cachem_1.0.6 [37] reshape_0.8.9 assertthat_0.2.1 promises_1.2.0.1 scales_1.2.0 [41] vroom_1.5.7 pinfsc50_1.2.0 gtable_0.3.0 formula.tools_1.7.1 [45] processx_3.7.0 spam_2.9-0 sandwich_3.0-2 MatrixModels_0.5-0 [49] rlang_1.0.6 calibrate_1.7.7 splines_4.2.1 rgdal_1.5-32 [53] hexbin_1.28.2 broom_1.0.0 BiocManager_1.30.18 reshape2_1.4.4 [57] modelr_0.1.8 backports_1.4.1 httpuv_1.6.5 tools_4.2.1 [61] usethis_2.1.6 ellipsis_0.3.2 raster_3.5-21 RColorBrewer_1.1-3 [65] DNAcopy_1.70.0 sessioninfo_1.2.2 Rcpp_1.0.9 zlibbioc_1.42.0 [69] RCurl_1.98-1.7 ps_1.7.1 prettyunits_1.1.1 viridis_0.6.2 [73] zoo_1.8-12 S4Vectors_0.34.0 haven_2.5.0 cluster_2.1.3 [77] fs_1.5.2 magrittr_2.0.3 SeqVarTools_1.34.0 data.table_1.14.2 [81] SparseM_1.81 genetics_1.3.8.1.3 lmtest_0.9-40 reprex_2.0.1 [85] mvtnorm_1.1-3 pkgload_1.3.0 hms_1.1.1 patchwork_1.1.1 [89] mime_0.12 xtable_1.8-4 readxl_1.4.0 IRanges_2.30.0 [93] gridExtra_2.3 compiler_4.2.1 mice_3.14.0 maps_3.4.0 [97] crayon_1.5.1 gdistance_1.3-6 R.oo_1.25.0 htmltools_0.5.2 [101] mgcv_1.8-40 later_1.3.0 tzdb_0.3.0 lubridate_1.8.0 [105] DBI_1.1.3 dbplyr_2.2.1 PopGenReport_3.0.7 MASS_7.3-57 [109] Matrix_1.4-1 permute_0.9-7 cli_3.4.1 R.methodsS3_1.8.2 [113] gdata_2.18.0.1 parallel_4.2.1 dotCall64_1.0-1 igraph_1.3.2 [117] GenomicRanges_1.48.0 pkgconfig_2.0.3 sp_1.5-0 terra_1.5-34 [121] versions_0.3 xml2_1.3.3 foreach_1.5.2 XVector_0.36.0 [125] rvest_1.0.2 quantsmooth_1.62.0 callr_3.7.1 digest_0.6.29 [129] vegan_2.6-2 Biostrings_2.64.0 cellranger_1.1.0 operator.tools_1.6.3 [133] curl_4.3.2 gap_1.2.3-6 quantreg_5.93 shiny_1.7.1 [137] gtools_3.9.3 lifecycle_1.0.3 nlme_3.1-157 dismo_1.3-5 [141] jsonlite_1.8.0 seqinr_4.2-16 viridisLite_0.4.0 fansi_1.0.3 [145] pillar_1.7.0 lattice_0.20-45 GGally_2.1.2 survival_3.3-1 [149] fastmap_1.1.0 httr_1.4.3 pkgbuild_1.3.1 glue_1.6.2 [153] mmod_1.3.3 png_0.1-7 iterators_1.0.14 bit_4.0.4 [157] stringi_1.7.8 blob_1.2.3 memoise_2.0.1 ape_5.6-2

smgogarten commented 10 months ago

This question was answered here: https://support.bioconductor.org/p/9155031/ Set autosome.only=FALSE in your call to pcair.