bcm-uga / pcadapt

Performing highly efficient genome scans for local adaptation with R package pcadapt v4
https://bcm-uga.github.io/pcadapt
39 stars 10 forks source link

updated file conversion? #17

Closed devonorourke closed 6 years ago

devonorourke commented 6 years ago

Hi guys, In following your first tutorial it appears that newer versions of pcadapt require vcf --> bed format conversion for large files. I'm wondering where I'm going wrong. The vcf --> bed file conversion was straightforward in PLINK, and the read.pcadapt function appears to accept my modified file:

SEpre <- read.pcadapt(SEpreBedPath, type = "bed")

> class(SEpre)
[1] "pcadapt_bed"

> str(SEpre)
Class 'pcadapt_bed'  atomic [1:1] /mnt/lustre/macmaneslab/devon/exome/vcfs/pcadapt/SEpre.ped
  ..- attr(*, "n")= int 31
  ..- attr(*, "p")= int 31

However the pcadapt function appears to fail any time I load the object (SEpre) resulting form the read.pcadapt function:

> x.SEpre.K2 <- pcadapt(input = SEpre, K = 2)
Error in bedXPtr(input, n, p) : File is not a binary PED file.

Any idea where I'm going wrong?

Perhaps you can also update your documentation regarding these file conversions and note that for large files you are now required to convert the vcf to a PLINK-formatted .bed file? Your program provides error messages that are helpful at discerning this is the case, but these weren't posted in your tutorial/manual. In addition it would be great if you can add to the documentation that you must specify the new file as type = "bed" when running the read.pcadapt function.

I've posted the R sessionInfo() below in case that's helpful. Thanks for your help

R version 3.4.2 (2017-09-28)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Scientific Linux 7.3 (Nitrogen)

Matrix products: default
BLAS: /mnt/lustre/software/linuxbrew/colsa-20171116/Cellar/r/3.4.2/lib/R/lib/libRblas.so
LAPACK: /mnt/lustre/software/linuxbrew/colsa-20171116/Cellar/r/3.4.2/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] plyr_1.8.4    pcadapt_4.0.3

loaded via a namespace (and not attached):
 [1] pinfsc50_1.1.0    Rcpp_0.12.17      pillar_1.2.1      compiler_3.4.2   
 [5] vcfR_1.8.0        bindr_0.1.1       tools_3.4.2       digest_0.6.15    
 [9] jsonlite_1.5      tibble_1.4.2      gtable_0.2.0      nlme_3.1-131     
[13] viridisLite_0.3.0 lattice_0.20-35   mgcv_1.8-23       pkgconfig_2.0.1  
[17] rlang_0.2.1       Matrix_1.2-12     parallel_3.4.2    bindrcpp_0.2.2   
[21] cluster_2.0.6     dplyr_0.7.6       httr_1.3.1        htmlwidgets_0.9  
[25] grid_3.4.2        tidyselect_0.2.4  glue_1.2.0        data.table_1.11.4
[29] R6_2.2.2          plotly_4.7.1      ggplot2_3.0.0     purrr_0.2.5      
[33] tidyr_0.7.2       magrittr_1.5      scales_0.5.0      htmltools_0.3.6  
[37] MASS_7.3-48       assertthat_0.2.0  permute_0.9-4     colorspace_1.3-2 
[41] ape_5.1           lazyeval_0.2.1    munsell_0.4.3     vegan_2.5-2  
privefl commented 6 years ago

From the extension, it seems that you have a .ped (plain text file) rather than a .bed (binary file).

devonorourke commented 6 years ago

Sorry for the delayed reply. It does indeed seem odd that the what I've uploaded is a plain text file instead of a bed, but I can assure you it's not a .ped file. That's what's confusing me! :)

I take a .vcf file (which was too big to load) and run this PLINK (PLINK v1.90b5.2) command:

plink 
  --allow-extra-chr
  --out SE
  --vcf SEcommon.vcf

This produces a trio of .fam, .bed, and .bim files, including that specious binary file...

To confirm this is indeed a binary, I ran another PLINK test:

plink --adjust --allow-extra-chr --allow-no-sex --assoc \
--bfile SE

This works great. However, if I swap out the last line to --file SE instead of --bfile SE the command fails. The file I'm using is a binary file, as best as I can tell.

I've attached this fairly small file to this email in case that helps your troubleshooting.

Thanks again for your help

SE.bed.zip

privefl commented 6 years ago
readBin("~/Téléchargements/SE.bed", what = 1L, n = 3, size = 1, signed = FALSE)
[1] 108  27   1

Seems OK. Do you have the bim and fam files too?

devonorourke commented 6 years ago

Sure thing (see below). Note I end up manually editing the .fam file because the info gets transposed incorrectly in that file.

Archive.zip

privefl commented 6 years ago

I'm able to run pcadapt() succesfully with the example you provided.

devonorourke commented 6 years ago

can you please send me the full script so I can replicate on my local machine?

On Thu, Aug 30, 2018 at 12:26 PM Florian Privé notifications@github.com wrote:

I'm able to run pcadapt() succesfully with the example you provided.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/bcm-uga/pcadapt/issues/17#issuecomment-417381063, or mute the thread https://github.com/notifications/unsubscribe-auth/AKqgXMkwAqn2KCL--usftAV-ryB9hHGLks5uWBI3gaJpZM4WOUyE .

-- Devon O'Rourke Graduate student in Molecular and Evolutionary Systems Biology University of New Hampshire

privefl commented 6 years ago
bed <- "~/Téléchargements/SE.bed"
readBin(bed, what = 1L, n = 3, size = 1, signed = FALSE)

# devtools::install_github("bcm-uga/pcadapt")
library(pcadapt)
SEpre <- read.pcadapt(bed, type = "bed")
obj <- pcadapt(input = SEpre, K = 2)
plot(obj)
devonorourke commented 6 years ago

thanks very much! I'm starting to think there might have been a version history issue? I've been testing this on our compute cluster (as that's where all the data lives) but when transferring this snippet of data to my laptop, which has the most recent CRAN version of pcadapt loaded, things worked without issue. so strange. thanks for the support!

privefl commented 6 years ago

Don't worry, thanks for reporting. Don't hesitate if you have any further question.