RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
29 stars 14 forks source link

flowjo_to_gatingset memory error #111

Open SansMorel opened 4 years ago

SansMorel commented 4 years ago

I am trying to read gates from FlowJo workspace, however, I am encountering a memory error when calling flowjo_to_gatingset () on my flowjo wsp.

library(flowWorkspace)
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#> 
#>   vignette("flowWorkspace-Introduction", "flowWorkspace")
wsfile <- list.files(pattern="wsp", full = T)
library(CytoML)
ws <- open_flowjo_xml(wsfile)
fj_ws_get_samples(ws)
#>    sampleID                     name   count pop.counts
#> 1        16 20191217_C57KC003_NC.fcs 1541997          9
#> 2        17 20191217_C57KC003_NC.fcs  290848         11
#> 3        18 20191217_C57KC003_NC.fcs  353657         11
#> 4        19 20191217_C57KC003_NC.fcs  142301         11
#> 5        20 20191217_C57KC003_NC.fcs  556701         11
#> 6        21 20191217_C57KC003_NC.fcs  225741         11
#> 7         1 20191217_C57KC003_NC.fcs  501074         11
#> 8         2 20191217_C57KC003_NC.fcs  453926          8
#> 9         3 20191217_C57KC003_NC.fcs   29503          8
#> 10        4 20191217_C57KC003_NC.fcs  352020          8
#> 11        5 20191217_C57KC003_NC.fcs  223846         11
#> 12        6 20191217_C57KC003_NC.fcs  426527         11
#> 13        7 20191217_C57KC003_NC.fcs  471965         11
#> 14        8 20191217_C57KC003_NC.fcs  387698         11
#> 15        9 20191217_C57KC003_NC.fcs  108852         11
#> 16       10 20191217_C57KC003_NC.fcs  445379         11
#> 17       11 20191217_C57KC003_NC.fcs  693633         11
#> 18       12 20191217_C57KC003_NC.fcs  452876         11
#> 19       13 20191217_C57KC003_NC.fcs  565591         11
#> 20       14 20191217_C57KC003_NC.fcs  752113         11
#> 21       15 20191217_C57KC003_NC.fcs  583126         11
fj_ws_get_sample_groups(ws)
#>      groupName groupID sampleID
#> 1  All Samples       0       16
#> 2  All Samples       0       17
#> 3  All Samples       0       18
#> 4  All Samples       0       19
#> 5  All Samples       0       20
#> 6  All Samples       0       21
#> 7  All Samples       0        1
#> 8  All Samples       0        2
#> 9  All Samples       0        3
#> 10 All Samples       0        4
#> 11 All Samples       0        5
#> 12 All Samples       0        6
#> 13 All Samples       0        7
#> 14 All Samples       0        8
#> 15 All Samples       0        9
#> 16 All Samples       0       10
#> 17 All Samples       0       11
#> 18 All Samples       0       12
#> 19 All Samples       0       13
#> 20 All Samples       0       14
#> 21 All Samples       0       15
gs <- flowjo_to_gatingset(ws, name = 1)
#> Error in (function (ws, group_id, subset, execute, path, cytoset, h5_dir, : std::bad_alloc
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Norwegian Bokmål_Norway.1252 
#> [2] LC_CTYPE=Norwegian Bokmål_Norway.1252   
#> [3] LC_MONETARY=Norwegian Bokmål_Norway.1252
#> [4] LC_NUMERIC=C                            
#> [5] LC_TIME=Norwegian Bokmål_Norway.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] CytoML_2.0.5        flowWorkspace_4.0.6
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.0    xfun_0.16           purrr_0.3.4        
#>  [4] lattice_0.20-41     colorspace_1.4-1    vctrs_0.3.2        
#>  [7] generics_0.0.2      htmltools_0.5.0     stats4_4.0.2       
#> [10] ncdfFlow_2.34.0     yaml_2.2.1          base64enc_0.1-3    
#> [13] flowCore_2.0.1      XML_3.99-0.5        RBGL_1.64.0        
#> [16] rlang_0.4.7         hexbin_1.28.1       pillar_1.4.6       
#> [19] glue_1.4.1          Rgraphviz_2.32.0    BiocGenerics_0.34.0
#> [22] RColorBrewer_1.1-2  plyr_1.8.6          matrixStats_0.56.0 
#> [25] jpeg_0.1-8.1        lifecycle_0.2.0     stringr_1.4.0      
#> [28] zlibbioc_1.34.0     RProtoBufLib_2.0.0  munsell_0.5.0      
#> [31] gtable_0.3.0        cytolib_2.0.3       evaluate_0.14      
#> [34] latticeExtra_0.6-29 Biobase_2.48.0      knitr_1.29         
#> [37] parallel_4.0.2      highr_0.8           Rcpp_1.0.5         
#> [40] scales_1.1.1        jsonlite_1.7.0      RcppParallel_5.0.2 
#> [43] graph_1.66.0        gridExtra_2.3       ggplot2_3.3.2      
#> [46] png_0.1-7           digest_0.6.25       stringi_1.4.6      
#> [49] dplyr_1.0.1         grid_4.0.2          tools_4.0.2        
#> [52] magrittr_1.5        tibble_3.0.3        crayon_1.3.4       
#> [55] pkgconfig_2.0.3     ellipsis_0.3.1      xml2_1.3.2         
#> [58] data.table_1.13.0   rmarkdown_2.3       R6_2.4.1           
#> [61] ggcyto_1.16.0       compiler_4.0.2

When run in reprex (above) the error message is different than when run in console. In console the error is: error: arma::memory::acquire(): out of memory Error in (function (ws, group_id, subset, execute, path, cytoset, h5_dir, : std::bad_alloc

The dataset I am trying to read consists of 21 fcs-files that take up 1.99 GB of storage space in total. My computer has 128 GB of RAM. I have tested two other datasets, one being 192 MB and the other being 6.98 MB. These are read successfully.

mikejiang commented 4 years ago

how about load a subset of this dataset to see if it went through, i.e.

gs <- flowjo_to_gatingset(ws, name = 1, subset = 1:2)

Also try to turn on the detailed logging (i.e. set_log_level("Gate")) and paste the messages that are immediate before the error

SansMorel commented 4 years ago

Hi, @mikejiang

I tried subsetting each file, but all result in the same error. Also, am I using the log function correctly? I just get the same error as without using it.

library(flowWorkspace)
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#> 
#>   vignette("flowWorkspace-Introduction", "flowWorkspace")
wsfile <- list.files(pattern="wsp", full = T)
library(CytoML)
ws <- open_flowjo_xml(wsfile)
fj_ws_get_samples(ws)
#>    sampleID                     name   count pop.counts
#> 1        16 20191217_C57KC003_NC.fcs 1541997          9
#> 2        17 20191217_C57KC003_NC.fcs  290848         11
#> 3        18 20191217_C57KC003_NC.fcs  353657         11
#> 4        19 20191217_C57KC003_NC.fcs  142301         11
#> 5        20 20191217_C57KC003_NC.fcs  556701         11
#> 6        21 20191217_C57KC003_NC.fcs  225741         11
#> 7         1 20191217_C57KC003_NC.fcs  501074         11
#> 8         2 20191217_C57KC003_NC.fcs  453926          8
#> 9         3 20191217_C57KC003_NC.fcs   29503          8
#> 10        4 20191217_C57KC003_NC.fcs  352020          8
#> 11        5 20191217_C57KC003_NC.fcs  223846         11
#> 12        6 20191217_C57KC003_NC.fcs  426527         11
#> 13        7 20191217_C57KC003_NC.fcs  471965         11
#> 14        8 20191217_C57KC003_NC.fcs  387698         11
#> 15        9 20191217_C57KC003_NC.fcs  108852         11
#> 16       10 20191217_C57KC003_NC.fcs  445379         11
#> 17       11 20191217_C57KC003_NC.fcs  693633         11
#> 18       12 20191217_C57KC003_NC.fcs  452876         11
#> 19       13 20191217_C57KC003_NC.fcs  565591         11
#> 20       14 20191217_C57KC003_NC.fcs  752113         11
#> 21       15 20191217_C57KC003_NC.fcs  583126         11
fj_ws_get_sample_groups(ws)
#>      groupName groupID sampleID
#> 1  All Samples       0       16
#> 2  All Samples       0       17
#> 3  All Samples       0       18
#> 4  All Samples       0       19
#> 5  All Samples       0       20
#> 6  All Samples       0       21
#> 7  All Samples       0        1
#> 8  All Samples       0        2
#> 9  All Samples       0        3
#> 10 All Samples       0        4
#> 11 All Samples       0        5
#> 12 All Samples       0        6
#> 13 All Samples       0        7
#> 14 All Samples       0        8
#> 15 All Samples       0        9
#> 16 All Samples       0       10
#> 17 All Samples       0       11
#> 18 All Samples       0       12
#> 19 All Samples       0       13
#> 20 All Samples       0       14
#> 21 All Samples       0       15
set_log_level("Gate")
#> [1] "Gate"
gs <- flowjo_to_gatingset(ws, name = 1, subset = 1)
#> Error in (function (ws, group_id, subset, execute, path, cytoset, h5_dir, : std::bad_alloc
get_log_level()
#> [1] "Gate"
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Norwegian Bokmål_Norway.1252 
#> [2] LC_CTYPE=Norwegian Bokmål_Norway.1252   
#> [3] LC_MONETARY=Norwegian Bokmål_Norway.1252
#> [4] LC_NUMERIC=C                            
#> [5] LC_TIME=Norwegian Bokmål_Norway.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] CytoML_2.0.5        flowWorkspace_4.0.6
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.0    xfun_0.16           purrr_0.3.4        
#>  [4] lattice_0.20-41     colorspace_1.4-1    vctrs_0.3.2        
#>  [7] generics_0.0.2      htmltools_0.5.0     stats4_4.0.2       
#> [10] ncdfFlow_2.34.0     yaml_2.2.1          base64enc_0.1-3    
#> [13] flowCore_2.0.1      XML_3.99-0.5        RBGL_1.64.0        
#> [16] rlang_0.4.7         hexbin_1.28.1       pillar_1.4.6       
#> [19] glue_1.4.1          Rgraphviz_2.32.0    BiocGenerics_0.34.0
#> [22] RColorBrewer_1.1-2  plyr_1.8.6          matrixStats_0.56.0 
#> [25] jpeg_0.1-8.1        lifecycle_0.2.0     stringr_1.4.0      
#> [28] zlibbioc_1.34.0     RProtoBufLib_2.0.0  munsell_0.5.0      
#> [31] gtable_0.3.0        cytolib_2.0.3       evaluate_0.14      
#> [34] latticeExtra_0.6-29 Biobase_2.48.0      knitr_1.29         
#> [37] parallel_4.0.2      highr_0.8           Rcpp_1.0.5         
#> [40] scales_1.1.1        jsonlite_1.7.0      RcppParallel_5.0.2 
#> [43] graph_1.66.0        gridExtra_2.3       ggplot2_3.3.2      
#> [46] png_0.1-7           digest_0.6.25       stringi_1.4.6      
#> [49] dplyr_1.0.1         grid_4.0.2          tools_4.0.2        
#> [52] magrittr_1.5        tibble_3.0.3        crayon_1.3.4       
#> [55] pkgconfig_2.0.3     ellipsis_0.3.1      xml2_1.3.2         
#> [58] data.table_1.13.0   rmarkdown_2.3       R6_2.4.1           
#> [61] ggcyto_1.16.0       compiler_4.0.2
mikejiang commented 4 years ago

It is strange that your log is not displayed. I wonder if it even hits the logic of c parser. Can you paste the traceback() result immediately after the error? Since it fails for single file, would you be able to share the example wsp and fcs file for troubleshooting?(wjiang2@fredhutch.org)

SansMorel commented 4 years ago

@mikejiang I tried making a new wsp too see if the previous was corrupted. During this process I discovered that it is a specific file causing this issue. This file has 1541997 rows and 56 columns. I'll send you a link to the file via email so you can take a look.

SansMorel commented 4 years ago

Sorry, I forgot to paste the traceback. Here it is:

4: stop(structure(list(message = "std::bad_alloc", call = (function (ws, 
       group_id, subset, execute, path, cytoset, h5_dir, includeGates, 
       additional_keys, additional_sampleID, keywords, is_pheno_data_from_FCS, 
       keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
       leaf_bool, include_empty_tree, skip_faulty_gate, comps, transform, 
       fcs_file_extension, greedy_match, fcs_parse_arg, num_threads = 1L) 
   {
       .Call(`_CytoML_parse_workspace`, ws, group_id, subset, execute, 
           path, cytoset, h5_dir, includeGates, additional_keys, 
           additional_sampleID, keywords, is_pheno_data_from_FCS, 
           keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
           leaf_bool, include_empty_tree, skip_faulty_gate, comps, 
           transform, fcs_file_extension, greedy_match, fcs_parse_arg, 
           num_threads)
   })(ws = <pointer: 0x0000018385d680f0>, group_id = 0, subset = list(), 
       execute = TRUE, path = "", cytoset = <pointer: 0x0000018392b45270>, 
       h5_dir = "C:\\Users\\Sturla\\AppData\\Local\\Temp\\Rtmp6ldHfD", 
       includeGates = TRUE, additional_keys = "$TOT", additional_sampleID = FALSE, 
       keywords = character(0), is_pheno_data_from_FCS = FALSE, 
       keyword_ignore_case = FALSE, extend_val = 0, extend_to = -4000, 
       channel_ignore_case = FALSE, leaf_bool = TRUE, include_empty_tree = FALSE, 
       skip_faulty_gate = FALSE, comps = list(), transform = TRUE, 
       fcs_file_extension = ".fcs", greedy_match = FALSE, fcs_parse_arg = list(), 
       num_threads = 1), cppstack = NULL), class = c("std::bad_alloc", 
   "C++Error", "error", "condition")))
3: (function (ws, group_id, subset, execute, path, cytoset, h5_dir, 
       includeGates, additional_keys, additional_sampleID, keywords, 
       is_pheno_data_from_FCS, keyword_ignore_case, extend_val, 
       extend_to, channel_ignore_case, leaf_bool, include_empty_tree, 
       skip_faulty_gate, comps, transform, fcs_file_extension, greedy_match, 
       fcs_parse_arg, num_threads = 1L) 
   {
       .Call(`_CytoML_parse_workspace`, ws, group_id, subset, execute, 
           path, cytoset, h5_dir, includeGates, additional_keys, 
           additional_sampleID, keywords, is_pheno_data_from_FCS, 
           keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
           leaf_bool, include_empty_tree, skip_faulty_gate, comps, 
           transform, fcs_file_extension, greedy_match, fcs_parse_arg, 
           num_threads)
   })(ws = <pointer: 0x0000018385d680f0>, group_id = 0, subset = list(), 
       execute = TRUE, path = "", cytoset = <pointer: 0x0000018392b45270>, 
       h5_dir = "C:\\Users\\Sturla\\AppData\\Local\\Temp\\Rtmp6ldHfD", 
       includeGates = TRUE, additional_keys = "$TOT", additional_sampleID = FALSE, 
       keywords = character(0), is_pheno_data_from_FCS = FALSE, 
       keyword_ignore_case = FALSE, extend_val = 0, extend_to = -4000, 
       channel_ignore_case = FALSE, leaf_bool = TRUE, include_empty_tree = FALSE, 
       skip_faulty_gate = FALSE, comps = list(), transform = TRUE, 
       fcs_file_extension = ".fcs", greedy_match = FALSE, fcs_parse_arg = list(), 
       num_threads = 1)
2: do.call(parse_workspace, args)
1: flowjo_to_gatingset(ws, name = 1, execute = T)
mikejiang commented 4 years ago

I can't reproduce your error. It seems to parse ok for me (on both bioc release and devel branches)

library(CytoML)
wsfile <- "~/Downloads/fcs and wsp/15-Aug-2020.wsp"
ws <- open_flowjo_xml(wsfile)
gs <- flowjo_to_gatingset(ws, name = 1)
library(flowWorkspace)
gh_pop_compare_stats(gs[[1]]) 
openCyto.freq   xml.freq openCyto.count xml.count                                                node
1:    1.00000000 1.00000000        1541997   1541997                                                root
2:    0.32648767 0.32648767         503443    503443                          /Time, Event_length subset
3:    0.08735448 0.08735448          43978     43978 Time, Event_length subset/Time, Event_length subset

But I am on linux, I will give it another try on windows

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] flowWorkspace_4.0.6 CytoML_2.0.5        BiocManager_1.30.10

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6        plyr_1.8.6          pillar_1.4.4        compiler_4.0.0      cytolib_2.0.3       RColorBrewer_1.1-2 
 [7] base64enc_0.1-3     tools_4.0.0         zlibbioc_1.34.0     digest_0.6.25       jsonlite_1.6.1      gtable_0.3.0       
[13] lifecycle_0.2.0     tibble_3.0.1        lattice_0.20-41     png_0.1-7           pkgconfig_2.0.3     rlang_0.4.6        
[19] graph_1.66.0        rstudioapi_0.11     Rgraphviz_2.32.0    yaml_2.2.1          parallel_4.0.0      hexbin_1.28.1      
[25] gridExtra_2.3       xml2_1.3.2          stringr_1.4.0       dplyr_1.0.0         generics_0.0.2      vctrs_0.3.1        
[31] stats4_4.0.0        grid_4.0.0          tidyselect_1.1.0    glue_1.4.1          data.table_1.12.8   Biobase_2.48.0     
[37] R6_2.4.1            jpeg_0.1-8.1        XML_3.99-0.3        RBGL_1.64.0         latticeExtra_0.6-29 ggplot2_3.3.1      
[43] RProtoBufLib_2.0.0  purrr_0.3.4         magrittr_1.5        scales_1.1.1        ellipsis_0.3.1      matrixStats_0.56.0 
[49] BiocGenerics_0.34.0 colorspace_1.4-1    flowCore_2.0.1      ncdfFlow_2.34.0     stringi_1.4.6       munsell_0.5.0      
[55] RcppParallel_5.0.1  crayon_1.3.4        ggcyto_1.16.0  
mikejiang commented 4 years ago

I've verified it worked fine on windows as well. So I'd recommend you reinstall cytolib, flowWorkspace and CytoML packages and see if the issue resolved.

SansMorel commented 4 years ago

Hi,

I've reinstalled R, cytolib, flowCore, flowWorkspace and CytoML from github and I still get the same error.

library(CytoML)
wsfile <- "C:/Users/sturl/Downloads/fcs and wsp/15-Aug-2020.wsp"
ws <- open_flowjo_xml(wsfile)
gs <- flowjo_to_gatingset(ws, name = 1)
#> Error in (function (ws, group_id, subset, execute, path, cytoset, backend_dir, : std::bad_alloc
> traceback()
4: stop(structure(list(message = "std::bad_alloc", call = (function (ws, 
       group_id, subset, execute, path, cytoset, backend_dir, backend, 
       includeGates, additional_keys, additional_sampleID, keywords, 
       is_pheno_data_from_FCS, keyword_ignore_case, extend_val, 
       extend_to, channel_ignore_case, leaf_bool, include_empty_tree, 
       skip_faulty_gate, comps, transform, fcs_file_extension, greedy_match, 
       fcs_parse_arg, num_threads = 1L) 
   {
       .Call(`_CytoML_parse_workspace`, ws, group_id, subset, execute, 
           path, cytoset, backend_dir, backend, includeGates, additional_keys, 
           additional_sampleID, keywords, is_pheno_data_from_FCS, 
           keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
           leaf_bool, include_empty_tree, skip_faulty_gate, comps, 
           transform, fcs_file_extension, greedy_match, fcs_parse_arg, 
           num_threads)
   })(ws = <pointer: 0x00000259e5c54ba0>, group_id = 0, subset = list(), 
       execute = TRUE, path = "", cytoset = <pointer: 0x00000259fa278540>, 
       backend_dir = "C:\\Users\\sturl\\AppData\\Local\\Temp\\RtmpCgHZEj", 
       backend = "h5", includeGates = TRUE, additional_keys = "$TOT", 
       additional_sampleID = FALSE, keywords = character(0), is_pheno_data_from_FCS = FALSE, 
       keyword_ignore_case = FALSE, extend_val = 0, extend_to = -4000, 
       channel_ignore_case = FALSE, leaf_bool = TRUE, include_empty_tree = FALSE, 
       skip_faulty_gate = FALSE, comps = list(), transform = TRUE, 
       fcs_file_extension = ".fcs", greedy_match = FALSE, fcs_parse_arg = list(), 
       num_threads = 1), cppstack = NULL), class = c("std::bad_alloc", 
   "C++Error", "error", "condition")))
3: (function (ws, group_id, subset, execute, path, cytoset, backend_dir, 
       backend, includeGates, additional_keys, additional_sampleID, 
       keywords, is_pheno_data_from_FCS, keyword_ignore_case, extend_val, 
       extend_to, channel_ignore_case, leaf_bool, include_empty_tree, 
       skip_faulty_gate, comps, transform, fcs_file_extension, greedy_match, 
       fcs_parse_arg, num_threads = 1L) 
   {
       .Call(`_CytoML_parse_workspace`, ws, group_id, subset, execute, 
           path, cytoset, backend_dir, backend, includeGates, additional_keys, 
           additional_sampleID, keywords, is_pheno_data_from_FCS, 
           keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
           leaf_bool, include_empty_tree, skip_faulty_gate, comps, 
           transform, fcs_file_extension, greedy_match, fcs_parse_arg, 
           num_threads)
   })(ws = <pointer: 0x00000259e5c54ba0>, group_id = 0, subset = list(), 
       execute = TRUE, path = "", cytoset = <pointer: 0x00000259fa278540>, 
       backend_dir = "C:\\Users\\sturl\\AppData\\Local\\Temp\\RtmpCgHZEj", 
       backend = "h5", includeGates = TRUE, additional_keys = "$TOT", 
       additional_sampleID = FALSE, keywords = character(0), is_pheno_data_from_FCS = FALSE, 
       keyword_ignore_case = FALSE, extend_val = 0, extend_to = -4000, 
       channel_ignore_case = FALSE, leaf_bool = TRUE, include_empty_tree = FALSE, 
       skip_faulty_gate = FALSE, comps = list(), transform = TRUE, 
       fcs_file_extension = ".fcs", greedy_match = FALSE, fcs_parse_arg = list(), 
       num_threads = 1)
2: do.call(parse_workspace, args)
1: flowjo_to_gatingset(ws)
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252 
#> [2] LC_CTYPE=English_United Kingdom.1252   
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] CytoML_2.1.11
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.0    xfun_0.16           purrr_0.3.4        
#>  [4] lattice_0.20-41     colorspace_1.4-1    vctrs_0.3.2        
#>  [7] generics_0.0.2      htmltools_0.5.0     stats4_4.0.2       
#> [10] ncdfFlow_2.34.0     yaml_2.2.1          base64enc_0.1-3    
#> [13] flowCore_2.1.2      RBGL_1.64.0         XML_3.99-0.5       
#> [16] rlang_0.4.7         hexbin_1.28.1       pillar_1.4.6       
#> [19] glue_1.4.1          aws.s3_0.3.21       Rgraphviz_2.32.0   
#> [22] BiocGenerics_0.34.0 RColorBrewer_1.1-2  plyr_1.8.6         
#> [25] matrixStats_0.56.0  jpeg_0.1-8.1        lifecycle_0.2.0    
#> [28] stringr_1.4.0       zlibbioc_1.34.0     RProtoBufLib_2.0.0 
#> [31] gtable_0.3.0        munsell_0.5.0       cytolib_2.1.17     
#> [34] evaluate_0.14       latticeExtra_0.6-29 Biobase_2.48.0     
#> [37] knitr_1.29          parallel_4.0.2      curl_4.3           
#> [40] flowWorkspace_4.1.8 highr_0.8           Rcpp_1.0.5         
#> [43] scales_1.1.1        S4Vectors_0.26.1    jsonlite_1.7.0     
#> [46] RcppParallel_5.0.2  graph_1.66.0        gridExtra_2.3      
#> [49] ggplot2_3.3.2       png_0.1-7           digest_0.6.25      
#> [52] stringi_1.4.6       dplyr_1.0.2         grid_4.0.2         
#> [55] tools_4.0.2         magrittr_1.5        tibble_3.0.3       
#> [58] crayon_1.3.4        aws.signature_0.6.0 pkgconfig_2.0.3    
#> [61] ellipsis_0.3.1      data.table_1.13.0   xml2_1.3.2         
#> [64] rmarkdown_2.3       httr_1.4.2          R6_2.4.1           
#> [67] ggcyto_1.16.0       compiler_4.0.2

edit: wrong sessionInfo text

SansMorel commented 4 years ago

If I read the file in flowCore, subset it, and write it as a new fcs. Then open wsp in text editor and change <Keyword name="$TOT" value="1541997" /> to <Keyword name="$TOT" value="10000" /> then it works fine.

library(flowCore)
fcs <- read.FCS("test.fcs", truncate_max_range = F)
fcs <- fcs[1:1e4,]
write.FCS(fcs, "test_copy.fcs")
SansMorel commented 4 years ago

I don't know if it helps, but doing what I described above (writing fcs files of different sizes) I have been able to figure out the exact number of events leading to the error: 1198372 rows works fine, but 1198373 rows causes the error. This was in a 56 parameter dataset.

If I reduce the number of columns, going from 56 to 55, the number of rows needed to get the error again increases to 1220162.

Looking at the number of elements in the matrix: 1198373 56 = 67108888 does not work 1198372 56 = 67108832 works

1220162 55 = 67108910 does not work 1220161 55 = 67108855 works

Seems the threshold is somewhere between 67108855 and 67108888 elements.

Some more tests: 1342177 50 = 67108850 works 1266205 53 = 67108865 does not work 1491308 * 45 = 67108860 works

My guess is that you might be able to reproduce this error if you make some large matrices.

SansMorel commented 4 years ago

Going in to the temp folder where h5 data is stored shows me that all the successful tests yield a file max 256MB. image

Is there a limit to the h5 file size?

mikejiang commented 4 years ago

I don't think error is from h5

library(CytoML)
wsfile <- "../Downloads/tt/15-Aug-2020.wsp"
ws <- open_flowjo_xml(wsfile)
gs <- flowjo_to_gatingset(ws, name = 1)
h5 <- cf_get_h5_file_path(get_cytoframe_from_cs(gs_cyto_data(gs),1))
utils:::format.object_size(file.size(h5), "auto")
[1] "329.5 Mb"
mikejiang commented 4 years ago

ok. Somehow I was using 32bit R on windows. After switching to 64bit, I am able to reproduce your error now. I will try to get to the bottom of it.

mikejiang commented 4 years ago

turned out to be the integer overflow issue. On linux , long is 64 bits wide, but MSVC (and the ABI used by Windows) defines long to be 32 bits wide, which overflows on this particular big dataset. I've switched to int64_t to ensure it is 64 bit across the platform. It should work now. You will need to reinstall cytolib, flowWorkspace and CytoML.

DomenicoSkyWalker89 commented 4 years ago

Hi Mike, thank for the help. The error continue as you can see below. I followed what you told reinstalling cytolib, flowWorkspace and CytoML. image

The problem persist only for the group 1 and 3 while the group 2 is load correctly. image

image

Best, Domenico

mikejiang commented 4 years ago

The fix is in the bioconductor development branch (probably will appear tomorrow). Or you can install it from source through github repo (if you know how to build the package from source on windows). So you will be looking for cytolib 2.1.18

DomenicoSkyWalker89 commented 4 years ago

Hi Mike, thanks a lot again.