RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

gs_get_singlecell_expression_by_gate with inverse.transform = T writes a new temporary file occupying disk space #361

Closed Close-your-eyes closed 2 years ago

Close-your-eyes commented 3 years ago

When one creates a gatingSet (gs) from a flowjo-workspace with CytoML::flowjo_to_gatingset() FCS files are written as .h5 files to disk in h5.dir, by default tempdir().

When calling gs_get_singlecell_expression_by_gate(gs, inverse.transform = F) these very h5-files seem to be employed as the command is executed quite fast. When calling it with inverse.transform = T though another h5-file is written to disk (maybe a duplicate of the files written by CytoML::flowjo_to_gatingset() but in a different folder in tempdir()). This takes time and hence the command takes longer to execute. I wonder why the original file is not used.

This duplicate is actually created multiple times when gs_get_singlecell_expression_by_gate(gs, inverse.transform = T) is called repeatedly. So, even gs_get_singlecell_expression_by_gate does not "remember" its own h5-file written to disk.

The h5-file written by CytoML::flowjo_to_gatingset() can be cleared with flowWorkspace::gs_cleanup_temp(gs). But the one from gs_get_singlecell_expression_by_gate not. (Or I do not know how to).

In the end I run out of disk space when trying to import many big FCS files.

For me this behaviour is unexpected and seems inefficient. Can you help?

These are the exact commands I am calling with additional arguments: gs <- CytoML::flowjo_to_gatingset(wsp, name = wsp.groups[x], path = FCS.file.folder, truncate_max_range = F, subset = i) expr <- flowWorkspace::gs_get_singlecell_expression_by_gate(gs, nodes = pop, threshold = F, inverse.transform = T)

SessionInfo:

R version 4.1.0 (2021-05-18) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.4

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale: [1] de_DE.UTF-8/de_DE.UTF-8/de_DE.UTF-8/C/de_DE.UTF-8/de_DE.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] CytoML_2.4.0 ggcyto_1.20.0 flowWorkspace_4.4.0 ncdfFlow_2.38.0
[5] BH_1.75.0-0 RcppArmadillo_0.10.5.0.0 flowCore_2.4.0 forcats_0.5.1
[9] stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4 readr_1.4.0
[13] tidyr_1.1.3 tibble_3.1.2 tidyverse_1.3.1 openxlsx_4.2.4
[17] writexl_1.4.0 lubridate_1.7.10 ggrepel_0.9.1 ggplot2_3.3.5
[21] ggnewscale_0.4.5 pbapply_1.4-3 RColorBrewer_1.1-2 scales_1.1.1
[25] ggridges_0.5.3 cowplot_1.1.1

mikejiang commented 3 years ago

@Close-your-eyes , addressed by PR #362