HannahVMeyer / PhenotypeSimulator

Other
28 stars 7 forks source link

Error in `.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed #20

Closed gpsccs closed 4 years ago

gpsccs commented 4 years ago

Describe the bug when I try to use the function of readStandardGenotypes(), it occour this error, I wonder what the problem is?

To Reproduce

genotypes <- readStandardGenotypes(N=1000, filename = genotypefile,format="plink", verbose=TRUE,sampleID = "SN") Error in .rowNamesDF<-(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique value when setting 'row.names': ‘.’

Looking forward to your answers

HannahVMeyer commented 4 years ago

Please provide your sessionInfo(). What is the genotype file you are trying to read?

gpsccs commented 4 years ago

Please provide your sessionInfo(). What is the genotype file you are trying to read?

sessionInfo() R version 3.6.0 (2019-04-26) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] LC_COLLATE=Chinese (Simplified)_China.936 [2] LC_CTYPE=Chinese (Simplified)_China.936
[3] LC_MONETARY=Chinese (Simplified)_China.936 [4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Simplified)_China.936

attached base packages: [1] stats graphics grDevices utils datasets methods
[7] base

other attached packages: [1] PhenotypeSimulator_0.3.3 sim1000G_1.40
[3] readr_1.3.1 stringr_1.4.0
[5] hapsim_0.31 caret_6.0-84
[7] ggplot2_3.2.1 G2P_1.1.0
[9] ROCR_1.0-7 gplots_3.0.1.1
[11] hglm_2.2-1 hglm.data_1.0-1
[13] sp_1.3-1 impute_1.60.0
[15] sommer_4.0.8 crayon_1.3.4
[17] lattice_0.20-38 MASS_7.3-51.4
[19] brnn_0.7 Formula_1.2-3
[21] pls_2.7-2 snowfall_1.84-6.1
[23] snow_0.4-3 rrBLUP_4.6
[25] randomForest_4.6-14 spls_2.2-3
[27] glmnet_2.0-18 foreach_1.4.7
[29] Matrix_1.2-17 e1071_1.7-2
[31] pROC_1.15.3 PRROC_1.3.1
[33] BGLR_1.0.8 data.table_1.12.6

loaded via a namespace (and not attached): [1] nlme_3.1-139 bitops_1.0-6 lubridate_1.7.4
[4] tools_3.6.0 backports_1.1.5 R6_2.4.0
[7] rpart_4.1-15 KernSmooth_2.23-15 lazyeval_0.2.2
[10] BiocGenerics_0.32.0 colorspace_1.4-1 nnet_7.3-12
[13] withr_2.1.2 tidyselect_0.2.5 compiler_3.6.0
[16] caTools_1.17.1.2 scales_1.0.0 R.utils_2.9.0
[19] pkgconfig_2.0.3 rlang_0.4.1 rstudioapi_0.10
[22] generics_0.0.2 zoo_1.8-6 gtools_3.8.1
[25] dplyr_0.8.3 ModelMetrics_1.2.2 R.oo_1.23.0
[28] magrittr_1.5 Rcpp_1.0.2 munsell_0.5.0
[31] R.methodsS3_1.7.1 stringi_1.4.3 yaml_2.2.0
[34] zlibbioc_1.32.0 plyr_1.8.4 recipes_0.1.7
[37] grid_3.6.0 parallel_3.6.0 gdata_2.18.0
[40] snpStats_1.36.0 cowplot_1.0.0 splines_3.6.0
[43] hms_0.5.2 zeallot_0.1.0 pillar_1.4.2
[46] optparse_1.6.4 reshape2_1.4.3 codetools_0.2-16
[49] stats4_3.6.0 glue_1.3.1 vctrs_0.2.0
[52] getopt_1.20.3 gtable_0.3.0 purrr_0.3.3
[55] AlphaSimR_0.11.0 assertthat_0.2.1 gower_0.2.1
[58] prodlim_2019.10.13 class_7.3-15 survival_2.44-1.1
[61] timeDate_3043.102 truncnorm_1.0-8 tibble_2.1.3
[64] iterators_1.0.12 lava_1.6.6 ipred_0.9-9

I was trying to read a .bed file as the genotype files and it was just converted from VCF by plink, the format of the file should be right I think

HannahVMeyer commented 4 years ago

can you run:

genotypes <- snpStats::read.plink(bed = genotypefile)
length(genotypes$fam$member)
length(unique(genotypes$fam$member))
length(genotypes$map$snp.name)
length(unique(genotypes$map$snp.name))
gpsccs commented 4 years ago

Nope. the same error occoured. Error in.rowNamesDF<-`(x, value = value) : duplicate 'row.names' are not allowed In addition: Warning message: non-unique value when setting 'row.names': ‘.’

length(genotypes$fam$member) [1] 0 length(unique(genotypes$fam$member)) [1] 0 length(genotypes$map$snp.name) [1] 0 length(unique(genotypes$map$snp.name)) [1] 0`

gpsccs commented 4 years ago

Or may I send the .bed .bim and .fam file to you?

HannahVMeyer commented 4 years ago

Ok, so the problem is not within a function of PhenotypeSimulator, but rather occurs in the snpStats::read.plink function. I am suspecting the problem lies in your .fam and .bim file that might not have been created properly in the conversion process? What do your sample and SNP ids look like?

HannahVMeyer commented 4 years ago

Or may I send the .bed .bim and .fam file to you?

Yes, feel free to send

gpsccs commented 4 years ago

Well, I think I know what the problem is. When I use plink to convert .vcf to .bed, there are some details lost. the bim file are just like: 1 . 0 652 A G I turn to vcftools for help and the function can run properly. which bim file like: 1 chr1:652 0 652 A G

However, another error occurred. `

genotypes <- readStandardGenotypes(N=300, filename = genotypefile,format="plink", verbose=TRUE,sampleID = "SN") genotypes_sd <- standardiseGenotypes(genotypes$genotypes) Error in FUN(X[[i]], ...) : SNP vector contains alleles not encoded as 0, 1 or 2 ` Did I miss something?

HannahVMeyer commented 4 years ago

Do you have missing values in your genotypes? Can you run

unique(as.vector(genotypes$genotypes))
gpsccs commented 4 years ago

Do you have missing values in your genotypes? Can you run

unique(as.vector(genotypes$genotypes))

Yes, the output are as follow. what should I do?

unique(as.vector(genotypes$genotypes)) [1] 0 2 1 NA

HannahVMeyer commented 4 years ago

gamason pointed to a similar issue last month: https://github.com/HannahVMeyer/PhenotypeSimulator/issues/17#issuecomment-537724570

I would follow the advise I gave there, install the development version of the PhenotypeSimulator and use impute=TRUE in standardiseGenotypes, to impute the missing genotypes. Alternatively, you can remove all SNPs with missing genotypes or use another imputation method of your choice prior to using PhenotypeSimulator. As I outlined here: https://github.com/HannahVMeyer/PhenotypeSimulator/issues/17#issuecomment-537724570 I believe imputation of missing values will be most useful though.

gpsccs commented 4 years ago

Thank you very much for your kind reply. I will try it later. Hope a good day for you.

HannahVMeyer commented 4 years ago

Happy to help, let me know if this solved your problem

HannahVMeyer commented 4 years ago

Closing this now, feel free to re-open if issue persists.