RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
45 stars 21 forks source link

save_gs and read_gs do not give exactly the same numerical results #258

Closed mlamarin closed 6 years ago

mlamarin commented 6 years ago

Dear RGLab team,

Thanks for your great work, it eases so much my daily work life.
I came recently to something possibly minor and likely caused by the serialization routine.

I hope that the reproducible example will work the same for you. It shows that stored and restored objects are not numerically identical.

Thanks Marc

library(flowWorkspace)
library(flowWorkspaceData)
library(devtools)

data_dir <- system.file("extdata",package="flowWorkspaceData")

wsfile <- list.files(data_dir, pattern="manual.xml",full=TRUE)
ws <- flowWorkspace::openWorkspace(wsfile);
gs <- flowWorkspace::parseWorkspace(ws, path = data_dir, name = 4,
  subset = c("CytoTrol_CytoTrol_1.fcs", "CytoTrol_CytoTrol_2.fcs"))

# resave and reload the gated object.
flowWorkspace::save_gs(gs, 'tmp2')
gs_loaded <- flowWorkspace::load_gs('tmp2')

# original gates values
ga_1_cd38mDRm <- flowWorkspace::getGate(gs[[1]], "/not debris/singlets/CD3+/CD8/38- DR-" )

# reloaded gates values
ga_loaded_1_cd38mDRm <- flowWorkspace::getGate(gs_loaded[[1]], "/not debris/singlets/CD3+/CD8/38- DR-" )

# not equal
ga_loaded_1_cd38mDRm@boundaries == ga_1_cd38mDRm@boundaries
#      <R660-A> <V545-A>
# [1,]    FALSE    FALSE
# [2,]    FALSE    FALSE
# [3,]    FALSE    FALSE
# [4,]    FALSE    FALSE

# small differences between the original object and the reloaded one
ga_loaded_1_cd38mDRm@boundaries[, 1] - ga_1_cd38mDRm@boundaries[, 1]
# [1] -4.835697e-03 -4.835697e-03 -4.388509e-06 -4.388509e-06

ga_loaded_1_cd38mDRm@boundaries[, 2] - ga_1_cd38mDRm@boundaries[, 2]
# [1] -8.111688e-06 -1.261400e-02 -1.261400e-02 -8.111688e-06

session_info()
# Session info -------------------------------------------------------------------
#  setting  value                       
#  version  R version 3.4.3 (2017-11-30)
#  system   x86_64, linux-gnu           
#  ui       X11                         
#  language (EN)                        
#  collate  en_US.UTF-8                 
#  tz       Europe/Paris                
#  date     2018-09-03                  
# 
# Packages -----------------------------------------------------------------------
#  package           * version     date       source        
#  assertthat          0.2.0       2017-04-11 CRAN (R 3.4.3)
#  BH                * 1.65.0-1    2017-08-24 CRAN (R 3.4.3)
#  bindr               0.1         2016-11-13 CRAN (R 3.4.3)
#  bindrcpp          * 0.2         2017-06-17 CRAN (R 3.4.3)
#  Biobase           * 2.38.0      2018-01-30 Bioconductor  
#  BiocGenerics      * 0.24.0      2018-01-30 Bioconductor  
#  bit                 1.1-12      2014-04-09 CRAN (R 3.4.3)
#  bit64               0.9-7       2017-05-08 CRAN (R 3.4.3)
#  blob                1.1.0       2017-06-17 CRAN (R 3.4.3)
#  clue                0.3-54      2017-08-09 CRAN (R 3.4.3)
#  cluster             2.0.6       2017-03-16 CRAN (R 3.4.3)
#  coda                0.19-1      2016-12-08 CRAN (R 3.4.3)
#  colorspace          1.3-2       2016-12-14 CRAN (R 3.4.3)
#  commonmark          1.4         2017-09-01 CRAN (R 3.4.3)
#  corpcor             1.6.9       2017-04-01 CRAN (R 3.4.3)
#  crayon              1.3.4       2017-09-16 CRAN (R 3.4.3)
#  data.table          1.10.4-3    2017-10-27 CRAN (R 3.4.3)
#  DBI                 0.7         2017-06-18 CRAN (R 3.4.3)
#  debugme             1.1.0       2017-10-22 CRAN (R 3.4.3)
#  DEoptimR            1.0-8       2016-11-19 CRAN (R 3.4.3)
#  devtools          * 1.10.0      2016-01-23 url           
#  digest              0.6.15      2018-01-28 CRAN (R 3.4.3)
#  dplyr               0.7.4       2017-09-28 CRAN (R 3.4.3)
#  fda                 2.4.7       2017-08-14 CRAN (R 3.4.3)
#  flowClust           3.16.0      2018-07-12 Bioconductor  
#  flowCore          * 1.44.2      2018-07-12 Bioconductor  
#  flowStats           3.36.0      2018-07-12 Bioconductor  
#  flowViz             1.42.0      2018-07-12 Bioconductor  
#  flowWorkspace     * 3.26.9      2018-07-12 Bioconductor  
#  flowWorkspaceData * 2.14.0      2018-07-12 Bioconductor  
#  glue                1.2.0       2017-10-29 CRAN (R 3.4.3)
#  graph               1.56.0      2018-01-30 Bioconductor  
#  gridExtra           2.3         2017-09-09 CRAN (R 3.4.3)
#  gtable              0.2.0       2016-02-26 CRAN (R 3.4.3)
#  gtools              3.5.0       2015-05-29 CRAN (R 3.4.3)
#  hexbin              1.27.2      2018-01-15 CRAN (R 3.4.3)
#  IDPmisc             1.1.17      2012-11-02 CRAN (R 3.4.3)
#  igraph              1.1.2       2017-07-21 CRAN (R 3.4.3)
#  KernSmooth          2.23-15     2015-06-29 CRAN (R 3.4.3)
#  ks                  1.11.0      2018-01-16 CRAN (R 3.4.3)
#  lattice             0.20-35     2017-03-25 CRAN (R 3.4.3)
#  latticeExtra        0.6-28      2016-02-09 CRAN (R 3.4.3)
#  magrittr            1.5         2014-11-22 CRAN (R 3.4.3)
#  MASS                7.3-48      2017-12-25 CRAN (R 3.4.3)
#  Matrix              1.2-12      2017-11-16 CRAN (R 3.4.3)
#  MatrixModels        0.4-1       2015-08-22 CRAN (R 3.4.3)
#  matrixStats         0.53.0      2018-01-24 CRAN (R 3.4.3)
#  mclust              5.4         2017-11-22 CRAN (R 3.4.3)
#  mcmc                0.9-5       2017-04-16 CRAN (R 3.4.3)
#  MCMCpack            1.4-1       2017-12-05 CRAN (R 3.4.3)
#  memoise             1.1.0       2017-04-21 CRAN (R 3.4.3)
#  mnormt              1.5-5       2016-10-15 CRAN (R 3.4.3)
#  munsell             0.4.3       2016-02-13 CRAN (R 3.4.3)
#  mvtnorm             1.0-6       2017-03-02 CRAN (R 3.4.3)
#  ncdfFlow          * 2.24.0      2018-07-12 Bioconductor  
#  openCyto          * 1.16.1      2018-07-12 Bioconductor  
#  pbapply             1.3-4       2018-01-10 CRAN (R 3.4.3)
#  pcaPP               1.9-73      2018-01-14 CRAN (R 3.4.3)
#  pillar              1.1.0       2018-01-14 CRAN (R 3.4.3)
#  pkgconfig           2.0.1       2017-03-21 CRAN (R 3.4.3)
#  plyr                1.8.4       2016-06-08 CRAN (R 3.4.3)
#  qbdb                0.1         <NA>       local         
#  qbdev             * 0.2         2018-04-25 local         
#  qbfcs             * 0.1         <NA>       local         
#  qbutils             0.3         <NA>       local         
#  quantreg            5.34        2017-10-25 CRAN (R 3.4.3)
#  R.methodsS3         1.7.1       2016-02-16 CRAN (R 3.4.3)
#  R.oo                1.21.0      2016-11-01 CRAN (R 3.4.3)
#  R.utils             2.6.0       2017-11-05 CRAN (R 3.4.3)
#  R6                  2.2.2       2017-06-17 CRAN (R 3.4.3)
#  RBGL                1.54.0      2018-01-30 Bioconductor  
#  RColorBrewer        1.1-2       2014-12-07 CRAN (R 3.4.3)
#  Rcpp                0.12.15     2018-01-20 CRAN (R 3.4.3)
#  RcppArmadillo     * 0.8.300.1.0 2017-12-06 CRAN (R 3.4.3)
#  Rgraphviz           2.22.0      2018-01-30 Bioconductor  
#  rj                * 2.0.5-2     2018-01-30 local         
#  rj.gd               2.0.0-1     2018-01-30 local         
#  rlang               0.1.6       2017-12-21 CRAN (R 3.4.3)
#  robustbase          0.92-8      2017-11-01 CRAN (R 3.4.3)
#  roxygen2            6.0.1       2017-02-06 CRAN (R 3.4.3)
#  rrcov               1.4-3       2016-09-06 CRAN (R 3.4.3)
#  RSQLite             2.0         2017-06-19 CRAN (R 3.4.3)
#  scales              0.5.0       2017-08-24 CRAN (R 3.4.3)
#  SparseM             1.77        2017-04-23 CRAN (R 3.4.3)
#  stringi             1.1.6       2017-11-17 CRAN (R 3.4.3)
#  stringr             1.2.0       2017-02-18 CRAN (R 3.4.3)
#  testthat          * 2.0.0       2017-12-13 CRAN (R 3.4.3)
#  tibble              1.4.2       2018-01-22 CRAN (R 3.4.3)
#  withr               2.1.1       2017-12-19 CRAN (R 3.4.3)
#  XML                 3.98-1.9    2017-06-19 CRAN (R 3.4.3)
#  xml2                1.1.1       2017-01-24 CRAN (R 3.4.3)
#  zlibbioc            1.24.0      2018-01-30 Bioconductor  
mikejiang commented 6 years ago

Thanks for the good question. The difference is due to the numeric precision loss (from 64 to 32 bits) during the serialization. We decided to use 32-bit float to save the space, which I don't think has any impact on gating given such small numeric error

> all.equal(ga_loaded_1_cd38mDRm@boundaries, ga_1_cd38mDRm@boundaries, tol = 3e-8)
[1] TRUE
mlamarin commented 6 years ago

Hi Mike Thanks for the answer, that's absolutely enough I agree. I will add the "tol = 3e-8" in my tests.

Best Marc