RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

GatingSets cannot have non-character columns in phenoData #375

Open mstone-modulus opened 2 years ago

mstone-modulus commented 2 years ago

Describe the bug When creating a GatingSet from a flowSet, or when attempting to modify an existing GatingSet, logical and numeric columns in phenoData are converted to character.

To Reproduce

## Make example data
library(cytoverse)
exprs <- matrix(seq(1, 6), ncol=2)
colnames(exprs) <- c("ch1", "ch2")
fs <- flowSet(flowFrame(exprs), flowFrame(exprs))
pData(fs)$is_test <- c(TRUE, FALSE)
pData(fs)$nums <- c(1, 2)

## Columns are logical/numeric when accessing via the flowSet
is(pData(fs)$is_test)
#>  [1] "logical"          "vector"           "atomic"           "index"            "replValue"       
#>  [6] "numLike"          "atomicVector"     "vector_OR_Vector" "vector_OR_factor" "Uvector"         
#> [11] "replValueSp"     
is(pData(fs)$nums)
#>  [1] "numeric"            "vector"             "atomic"             "characterOrNumeric" "Cnumeric"          
#>  [6] "Unumeric"           "index"              "replValue"          "numLike"            "number"            
#> [11] "atomicVector"       "EnumerationValue"   "vector_OR_Vector"   "vector_OR_factor"   "Uvector"           
#> [16] "replValueSp"  

## But not after creating GatingSet
gs <- GatingSet(fs)
is(pData(gs)$is_test)
#>  [1] "character"                 "vector"                    "data.frameRowLabels"      
#>  [4] "SuperClassMethod"          "character_OR_connection"   "characterORMIAME"         
#>  [7] "character_OR_NULL"         "atomic"                    "characterOrTransformation"
#> [10] "characterOrParameters"     "characterOrNumeric"        "Cnumeric"                 
#> [13] "Ufunction"                 "index"                     "atomicVector"             
#> [16] "EnumerationValue"          "vector_OR_Vector"          "vector_OR_factor"         
#> [19] "Uvector"
is(pData(gs)$nums)
#>  [1] "character"                 "vector"                    "data.frameRowLabels"      
#>  [4] "SuperClassMethod"          "character_OR_connection"   "characterORMIAME"         
#>  [7] "character_OR_NULL"         "atomic"                    "characterOrTransformation"
#> [10] "characterOrParameters"     "characterOrNumeric"        "Cnumeric"                 
#> [13] "Ufunction"                 "index"                     "atomicVector"             
#> [16] "EnumerationValue"          "vector_OR_Vector"          "vector_OR_factor"         
#> [19] "Uvector"    

## Explicitly casting the column to the expected type results in the column being 
## immediately converted back to character
pData(gs)$is_test <- as.logical(pData(gs)$is_test)
is(pData(gs)$is_test)
#>  [1] "character"                 "vector"                    "data.frameRowLabels"      
#>  [4] "SuperClassMethod"          "character_OR_connection"   "characterORMIAME"         
#>  [7] "character_OR_NULL"         "atomic"                    "characterOrTransformation"
#> [10] "characterOrParameters"     "characterOrNumeric"        "Cnumeric"                 
#> [13] "Ufunction"                 "index"                     "atomicVector"             
#> [16] "EnumerationValue"          "vector_OR_Vector"          "vector_OR_factor"         
#> [19] "Uvector"     

Expected behavior I expected columns to maintain their original type when creating a GatingSet, and to be able to create non-character columns in an existing GatingSet.

SessionInfo:

R version 4.1.2 (2021-11-01)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.3

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reprex_2.0.1             CytoML_2.6.0             openCyto_2.6.0           ggcyto_1.22.0           
 [5] ncdfFlow_2.40.0          BH_1.78.0-0              RcppArmadillo_0.10.8.1.0 ggplot2_3.3.5           
 [9] flowWorkspace_4.6.0      flowCore_2.6.0           cytoverse_0.0.0.9000    

loaded via a namespace (and not attached):
  [1] fs_1.5.2            bitops_1.0-7        matrixStats_0.61.0  RColorBrewer_1.1-2  httr_1.4.2         
  [6] R.cache_0.15.0      Rgraphviz_2.38.0    tools_4.1.2         utf8_1.2.2          R6_2.5.1           
 [11] KernSmooth_2.23-20  DBI_1.1.2           BiocGenerics_0.40.0 colorspace_2.0-2    withr_2.4.3        
 [16] tidyselect_1.1.1    gridExtra_2.3       mnormt_2.0.2        curl_4.3.2          compiler_4.1.2     
 [21] graph_1.72.0        cli_3.1.1           Biobase_2.54.0      flowClust_3.32.0    xml2_1.3.3         
 [26] flowStats_4.6.0     scales_1.1.1        DEoptimR_1.0-10     hexbin_1.28.2       mvtnorm_1.1-3      
 [31] robustbase_0.93-9   RBGL_1.70.0         digest_0.6.29       rainbow_3.6         R.utils_2.11.0     
 [36] rrcov_1.6-2         base64enc_0.1-3     jpeg_0.1-9          pkgconfig_2.0.3     styler_1.7.0       
 [41] rlang_1.0.1         rstudioapi_0.13     generics_0.1.2      jsonlite_1.7.3      gtools_3.9.2       
 [46] mclust_5.4.9        dplyr_1.0.7         R.oo_1.24.0         RCurl_1.98-1.5      magrittr_2.0.2     
 [51] RProtoBufLib_2.6.0  Matrix_1.3-4        Rcpp_1.0.8          munsell_0.5.0       S4Vectors_0.32.3   
 [56] fansi_1.0.2         clipr_0.7.1         lifecycle_1.0.1     R.methodsS3_1.8.1   yaml_2.2.2         
 [61] MASS_7.3-54         zlibbioc_1.40.0     plyr_1.8.6          grid_4.1.2          parallel_4.1.2     
 [66] crayon_1.4.2        lattice_0.20-45     splines_4.1.2       tmvnsim_1.0-2       knitr_1.37         
 [71] pillar_1.7.0        fda_5.5.1           corpcor_1.6.10      stats4_4.1.2        XML_3.99-0.8       
 [76] glue_1.6.1          latticeExtra_0.6-29 data.table_1.14.2   RcppParallel_5.1.5  deSolve_1.30       
 [81] png_0.1-7           vctrs_0.3.8         gtable_0.3.0        aws.s3_0.3.21       purrr_0.3.4        
 [86] clue_0.3-60         assertthat_0.2.1    ks_1.13.4           xfun_0.29           fds_1.8            
 [91] pracma_2.3.6        IDPmisc_1.1.20      pcaPP_1.9-74        tibble_3.1.6        cytolib_2.6.2      
 [96] aws.signature_0.6.0 flowViz_1.58.0      ellipse_0.4.2       cluster_2.1.2       ellipsis_0.3.2     
[101] hdrcde_3.4   

Additional context Thanks for taking a look at this!

mstone-modulus commented 2 years ago

Note: this also appears to be true for cytoset

mikejiang commented 2 years ago

This is a known issue, which we haven't come up a solution yet. Can you tell me your use case? Maybe there is some work around

mstone-modulus commented 2 years ago

Thanks Mike. We were trying to subset GatingSets/flowsets/cytosets by columns in phenoData. It's pretty straightforward to workaround with e.g. subset(gs, bool_col == "TRUE"), just wanted to flag the unexpected behavior.