andrewparkermorgan / argyle

An R package for import, QC and analysis of Illumina Infinium genotyping arrays
32 stars 10 forks source link

cannot merge genotype objects #7

Open danfulop opened 6 years ago

danfulop commented 6 years ago

I am getting an odd error when I try to merge genotype objects:

> gt.all <- merge(gt.sub, gt2.sub, check.alleles = F)
Set A has 143256 markers x 73 samples.
Set B has 143256 markers x 5 samples.
Error in base::cbind(unclass(x)[new.o, ], unclass(y)[new.o, ]) : 
  object 'new.o' not found

Is there another way I can merge these 2 objects?

Here's my sessionInfo():

> session_info()
Session info ------------------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.5.0 (2018-04-23)
 system   x86_64, darwin15.6.0        
 ui       RStudio (1.1.453)           
 language (EN)                        
 collate  en_US.UTF-8                 
 tz       America/Los_Angeles         
 date     2018-05-23                  

Packages ----------------------------------------------------------------------------------------------------------------------------------------------------------
 package       * version date       source                                    
 argyle        * 0.2.2   2018-05-24 Github (andrewparkermorgan/argyle@f6b846c)
 assertthat      0.2.0   2017-04-11 CRAN (R 3.5.0)                            
 base          * 3.5.0   2018-04-24 local                                     
 bindr           0.1.1   2018-03-13 CRAN (R 3.5.0)                            
 bindrcpp        0.2.2   2018-03-29 CRAN (R 3.5.0)                            
 BiocInstaller * 1.30.0  2018-05-01 Bioconductor                              
 colorspace      1.3-2   2016-12-14 CRAN (R 3.5.0)                            
 compiler        3.5.0   2018-04-24 local                                     
 curl            3.2     2018-03-28 CRAN (R 3.5.0)                            
 datasets      * 3.5.0   2018-04-24 local                                     
 devtools      * 1.13.5  2018-02-18 CRAN (R 3.5.0)                            
 digest          0.6.15  2018-01-28 CRAN (R 3.5.0)                            
 dplyr         * 0.7.5   2018-05-19 CRAN (R 3.5.0)                            
 ggplot2       * 2.2.1   2016-12-30 CRAN (R 3.5.0)                            
 git2r           0.21.0  2018-01-04 CRAN (R 3.5.0)                            
 glue            1.2.0   2017-10-29 CRAN (R 3.5.0)                            
 graphics      * 3.5.0   2018-04-24 local                                     
 grDevices     * 3.5.0   2018-04-24 local                                     
 grid            3.5.0   2018-04-24 local                                     
 gtable          0.2.0   2016-02-26 CRAN (R 3.5.0)                            
 httr            1.3.1   2017-08-20 CRAN (R 3.5.0)                            
 lazyeval        0.2.1   2017-10-29 CRAN (R 3.5.0)                            
 magrittr      * 1.5     2014-11-22 CRAN (R 3.5.0)                            
 memoise         1.1.0   2017-04-21 CRAN (R 3.5.0)                            
 methods       * 3.5.0   2018-04-24 local                                     
 munsell         0.4.3   2016-02-13 CRAN (R 3.5.0)                            
 pillar          1.2.2   2018-04-26 CRAN (R 3.5.0)                            
 pkgconfig       2.0.1   2017-03-21 CRAN (R 3.5.0)                            
 plyr            1.8.4   2016-06-08 CRAN (R 3.5.0)                            
 purrr           0.2.4   2017-10-18 CRAN (R 3.5.0)                            
 R6              2.2.2   2017-06-17 CRAN (R 3.5.0)                            
 Rcpp            0.12.17 2018-05-18 CRAN (R 3.5.0)                            
 rlang           0.2.0   2018-02-20 CRAN (R 3.5.0)                            
 scales          0.5.0   2017-08-24 CRAN (R 3.5.0)                            
 stats         * 3.5.0   2018-04-24 local                                     
 tibble          1.4.2   2018-01-22 CRAN (R 3.5.0)                            
 tidyr         * 0.8.1   2018-05-18 CRAN (R 3.5.0)                            
 tidyselect      0.2.4   2018-02-26 CRAN (R 3.5.0)                            
 tools           3.5.0   2018-04-24 local                                     
 utils         * 3.5.0   2018-04-24 local                                     
 withr           2.1.2   2018-03-15 CRAN (R 3.5.0)
danfulop commented 6 years ago

cbind(gt.sub, gt2.sub) worked, FWIW.

andrewparkermorgan commented 6 years ago

I will look into this error in more detail when I get a chance, but I suspect it comes from a (poorly-tested) branch of the merge() function that handles the case when check.alleles == FALSE.

My intended use for the merge() function was for genotype matrices that share some but not all markers in common -- ie. from two different SNP array platforms. In that case I would recommend to always set check.alleles = TRUE to catch strand swaps and other inconsistencies between platforms.

However if you have two genotype matrices with the same markers (and the same alleles at those markers) but non-overlapping samples -- ie. two batches run on the same SNP array platform, or two splits of the same dataset -- then cbind() is safe and much faster.

danfulop commented 6 years ago

Yeah, my case is the latter, which is why I thought of cbind(). I should double check that the alleles at all markers are the same.

williamgibbons commented 6 years ago

I've also been having the same issue for a while now. When I use the merge function I get the same "object 'new.o' not found". I just tried setting check.alleles=TRUE and got a different error. "In order to perform allele check efficiently, both input datasets should be in the '01' numeric encoding." The two files that I'm trying to merge are also the same type (GigaMUGA data). The interesting thing is I have argyle on two different and equivalent PCs. It works on one, but the second gives the errors. I did install the program on the second PC at a later time than the first. Did something change either in argyle or one of the other functions that has to be loaded into R? It is also possible that I loaded things differently the second time, but with others getting the same error I'm not so sure if that is it.