Closed gpsccs closed 4 years ago
Please provide your sessionInfo()
. What is the genotype file you are trying to read?
Please provide your
sessionInfo()
. What is the genotype file you are trying to read?sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding
locale:
[1] LC_COLLATE=Chinese (Simplified)_China.936
[2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] PhenotypeSimulator_0.3.3 sim1000G_1.40
[3] readr_1.3.1 stringr_1.4.0
[5] hapsim_0.31 caret_6.0-84
[7] ggplot2_3.2.1 G2P_1.1.0
[9] ROCR_1.0-7 gplots_3.0.1.1
[11] hglm_2.2-1 hglm.data_1.0-1
[13] sp_1.3-1 impute_1.60.0
[15] sommer_4.0.8 crayon_1.3.4
[17] lattice_0.20-38 MASS_7.3-51.4
[19] brnn_0.7 Formula_1.2-3
[21] pls_2.7-2 snowfall_1.84-6.1
[23] snow_0.4-3 rrBLUP_4.6
[25] randomForest_4.6-14 spls_2.2-3
[27] glmnet_2.0-18 foreach_1.4.7
[29] Matrix_1.2-17 e1071_1.7-2
[31] pROC_1.15.3 PRROC_1.3.1
[33] BGLR_1.0.8 data.table_1.12.6
loaded via a namespace (and not attached):
[1] nlme_3.1-139 bitops_1.0-6 lubridate_1.7.4
[4] tools_3.6.0 backports_1.1.5 R6_2.4.0
[7] rpart_4.1-15 KernSmooth_2.23-15 lazyeval_0.2.2
[10] BiocGenerics_0.32.0 colorspace_1.4-1 nnet_7.3-12
[13] withr_2.1.2 tidyselect_0.2.5 compiler_3.6.0
[16] caTools_1.17.1.2 scales_1.0.0 R.utils_2.9.0
[19] pkgconfig_2.0.3 rlang_0.4.1 rstudioapi_0.10
[22] generics_0.0.2 zoo_1.8-6 gtools_3.8.1
[25] dplyr_0.8.3 ModelMetrics_1.2.2 R.oo_1.23.0
[28] magrittr_1.5 Rcpp_1.0.2 munsell_0.5.0
[31] R.methodsS3_1.7.1 stringi_1.4.3 yaml_2.2.0
[34] zlibbioc_1.32.0 plyr_1.8.4 recipes_0.1.7
[37] grid_3.6.0 parallel_3.6.0 gdata_2.18.0
[40] snpStats_1.36.0 cowplot_1.0.0 splines_3.6.0
[43] hms_0.5.2 zeallot_0.1.0 pillar_1.4.2
[46] optparse_1.6.4 reshape2_1.4.3 codetools_0.2-16
[49] stats4_3.6.0 glue_1.3.1 vctrs_0.2.0
[52] getopt_1.20.3 gtable_0.3.0 purrr_0.3.3
[55] AlphaSimR_0.11.0 assertthat_0.2.1 gower_0.2.1
[58] prodlim_2019.10.13 class_7.3-15 survival_2.44-1.1
[61] timeDate_3043.102 truncnorm_1.0-8 tibble_2.1.3
[64] iterators_1.0.12 lava_1.6.6 ipred_0.9-9
I was trying to read a .bed file as the genotype files and it was just converted from VCF by plink, the format of the file should be right I think
can you run:
genotypes <- snpStats::read.plink(bed = genotypefile)
length(genotypes$fam$member)
length(unique(genotypes$fam$member))
length(genotypes$map$snp.name)
length(unique(genotypes$map$snp.name))
Nope. the same error occoured.
Error in
.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique value when setting 'row.names': ‘.’
length(genotypes$fam$member) [1] 0 length(unique(genotypes$fam$member)) [1] 0 length(genotypes$map$snp.name) [1] 0 length(unique(genotypes$map$snp.name)) [1] 0`
Or may I send the .bed .bim and .fam file to you?
Ok, so the problem is not within a function of PhenotypeSimulator
, but rather occurs in the snpStats::read.plink
function. I am suspecting the problem lies in your .fam and .bim file that might not have been created properly in the conversion process? What do your sample and SNP ids look like?
Or may I send the .bed .bim and .fam file to you?
Yes, feel free to send
Well, I think I know what the problem is. When I use plink to convert .vcf to .bed, there are some details lost. the bim file are just like: 1 . 0 652 A G I turn to vcftools for help and the function can run properly. which bim file like: 1 chr1:652 0 652 A G
However, another error occurred. `
genotypes <- readStandardGenotypes(N=300, filename = genotypefile,format="plink", verbose=TRUE,sampleID = "SN") genotypes_sd <- standardiseGenotypes(genotypes$genotypes) Error in FUN(X[[i]], ...) : SNP vector contains alleles not encoded as 0, 1 or 2 ` Did I miss something?
Do you have missing values in your genotypes? Can you run
unique(as.vector(genotypes$genotypes))
Do you have missing values in your genotypes? Can you run
unique(as.vector(genotypes$genotypes))
Yes, the output are as follow. what should I do?
unique(as.vector(genotypes$genotypes)) [1] 0 2 1 NA
gamason pointed to a similar issue last month: https://github.com/HannahVMeyer/PhenotypeSimulator/issues/17#issuecomment-537724570
I would follow the advise I gave there, install the development version of the PhenotypeSimulator
and use impute=TRUE
in standardiseGenotypes
, to impute the missing genotypes. Alternatively, you can remove all SNPs with missing genotypes or use another imputation method of your choice prior to using PhenotypeSimulator
. As I outlined here: https://github.com/HannahVMeyer/PhenotypeSimulator/issues/17#issuecomment-537724570
I believe imputation of missing values will be most useful though.
Thank you very much for your kind reply. I will try it later. Hope a good day for you.
Happy to help, let me know if this solved your problem
Closing this now, feel free to re-open if issue persists.
Describe the bug when I try to use the function of readStandardGenotypes(), it occour this error, I wonder what the problem is?
To Reproduce
Looking forward to your answers