RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

gs_pop_get_count_fast() freq extract not matching FlowJo Pop Freq Table Export #369

Open miosisoniii opened 2 years ago

miosisoniii commented 2 years ago

I am trying to replicate the Population Frequency Statistics calculated by FlowJo using the Population proportions extracted by FlowWorkspace from XML but for some reason basic conversion to percentage from the proportion and rounding with significant figures does not match the output from the FlowJo table export.

To perform this manual export of the Population Frequency from FlowJo, I go to Table Editor and select Create Table. The presumably default Table that is exported contains the Population Frequency in a Percent, which is rounded to 3 significant figures (My FlowJo Decimal Precision is set to 2, and my Significant Figures is set to 3). Below is the table (the first column are sensitive sample ID's. image

Using FlowWorkspace/OpenCyto I export the frequency table using the "freq" and "wide" and transpose it to match the format of the Table that is exported from FlowJo:

library(openCyto)
library(flowWorkspace)
library(CytoML)
#> Warning: package 'CytoML' was built under R version 3.6.3
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tibble)
library(reprex)

# not used for example...
# open workspace and convert to gs
# wsp <- CytoML::open_flowjo_xml(path)
# gs <- wsp %>% flowjo_to_gatingset(name = "All Samples")
# freq_gs <- flowWorkspace::gs_pop_get_count_fast(gs, statistic = "freq", format = "wide")
# transpose to get into table format thats similar to flowjo export
# t <- freq_gs %>% t() %>% as.data.frame() %>% dplyr::select(-root)
# match same fcs file name as FlowJo export
# t <- t %>% tibble::rownames_to_column("V1")
# saved table to csv
# write.csv(t, "~/projects/opencyto/cyto_ex.csv", row.names = FALSE)

Created on 2022-02-04 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 3.6.2 (2019-12-12) #> os Windows 10 x64 (build 18363) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/New_York #> date 2022-02-04 #> pandoc 2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown) #> #> - Packages ------------------------------------------------------------------- #> ! package * version date (UTC) lib source #> 1 assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.3) #> 1 backports 1.2.1 2020-12-09 [1] CRAN (R 3.6.3) #> 1 base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.6.0) #> 1 Biobase 2.46.0 2019-10-29 [1] Bioconductor #> 1 BiocGenerics 0.32.0 2019-10-29 [1] Bioconductor #> 1 bitops 1.0-7 2021-04-24 [1] CRAN (R 3.6.3) #> 1 cli 3.1.0 2021-10-27 [1] CRAN (R 3.6.2) #> 1 clue 0.3-59 2021-04-16 [1] CRAN (R 3.6.3) #> 1 cluster 2.1.0 2019-06-19 [2] CRAN (R 3.6.2) #> 1 colorspace 2.0-1 2021-05-04 [1] CRAN (R 3.6.3) #> 1 corpcor 1.6.10 2021-09-16 [1] CRAN (R 3.6.2) #> 1 crayon 1.4.2 2021-10-29 [1] CRAN (R 3.6.2) #> 1 CytoML * 1.12.1 2020-03-26 [1] Bioconductor #> 1 data.table 1.14.0 2021-02-21 [1] CRAN (R 3.6.3) #> 1 DBI 1.1.2 2021-12-20 [1] CRAN (R 3.6.2) #> 1 DEoptimR 1.0-10 2022-01-03 [1] CRAN (R 3.6.2) #> 1 deSolve 1.28 2020-03-08 [1] CRAN (R 3.6.3) #> 1 digest 0.6.27 2020-10-24 [1] CRAN (R 3.6.3) #> 1 dplyr * 1.0.7 2021-06-18 [1] CRAN (R 3.6.2) #> 1 ellipse 0.4.2 2020-05-27 [1] CRAN (R 3.6.3) #> 1 ellipsis 0.3.2 2021-04-29 [1] CRAN (R 3.6.3) #> 1 evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.3) #> 1 fansi 0.4.2 2021-01-15 [1] CRAN (R 3.6.3) #> 1 fastmap 1.1.0 2021-01-25 [1] CRAN (R 3.6.3) #> 1 fda 5.5.1 2021-11-17 [1] CRAN (R 3.6.2) #> 1 fds 1.8 2018-10-31 [1] CRAN (R 3.6.3) #> 1 flowClust 3.24.0 2019-10-29 [1] Bioconductor #> 1 flowCore 1.52.1 2019-12-04 [1] Bioconductor #> 1 flowStats 3.44.0 2019-10-29 [1] Bioconductor #> 1 flowViz 1.50.0 2019-10-29 [1] Bioconductor #> 1 flowWorkspace * 3.34.1 2020-01-02 [1] Bioconductor #> 1 fs 1.5.0 2020-07-31 [1] CRAN (R 3.6.3) #> 1 generics 0.1.1 2021-10-25 [1] CRAN (R 3.6.2) #> 1 ggcyto 1.14.1 2020-03-07 [1] Bioconductor #> 1 ggplot2 3.3.5 2021-06-25 [1] CRAN (R 3.6.2) #> 1 glue 1.4.2 2020-08-27 [1] CRAN (R 3.6.3) #> 1 graph 1.64.0 2019-10-29 [1] Bioconductor #> 1 gridExtra 2.3 2017-09-09 [1] CRAN (R 3.6.3) #> 1 gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.3) #> 1 gtools 3.8.2 2020-03-31 [1] CRAN (R 3.6.3) #> 1 hdrcde 3.4 2021-01-18 [1] CRAN (R 3.6.3) #> 1 hexbin 1.28.2 2021-01-08 [1] CRAN (R 3.6.3) #> 1 highr 0.9 2021-04-16 [1] CRAN (R 3.6.3) #> 1 htmltools 0.5.2 2021-08-25 [1] CRAN (R 3.6.2) #> 1 IDPmisc 1.1.20 2020-01-21 [1] CRAN (R 3.6.3) #> 1 jpeg 0.1-8.1 2019-10-24 [1] CRAN (R 3.6.1) #> 1 jsonlite 1.7.2 2020-12-09 [1] CRAN (R 3.6.3) #> 1 KernSmooth 2.23-16 2019-10-15 [2] CRAN (R 3.6.2) #> 1 knitr 1.37 2021-12-16 [1] CRAN (R 3.6.2) #> 1 ks 1.12.0 2021-02-07 [1] CRAN (R 3.6.3) #> 1 lattice 0.20-38 2018-11-04 [2] CRAN (R 3.6.2) #> 1 latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 3.6.3) #> 1 lifecycle 1.0.1 2021-09-24 [1] CRAN (R 3.6.2) #> 1 magrittr 2.0.1 2020-11-17 [1] CRAN (R 3.6.3) #> 1 MASS 7.3-51.4 2019-03-31 [2] CRAN (R 3.6.2) #> 1 Matrix 1.2-18 2019-11-27 [2] CRAN (R 3.6.2) #> 1 matrixStats 0.58.0 2021-01-29 [1] CRAN (R 3.6.3) #> 1 mclust 5.4.7 2020-11-20 [1] CRAN (R 3.6.3) #> 1 mnormt 2.0.2 2020-09-01 [1] CRAN (R 3.6.3) #> 1 munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.3) #> 1 mvtnorm 1.1-3 2021-10-08 [1] CRAN (R 3.6.2) #> 1 ncdfFlow 2.32.0 2019-10-29 [1] Bioconductor #> 1 openCyto * 1.24.0 2019-10-29 [1] Bioconductor #> 1 pcaPP 1.9-74 2021-04-23 [1] CRAN (R 3.6.3) #> 1 pillar 1.6.4 2021-10-18 [1] CRAN (R 3.6.2) #> 1 pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.3) #> 1 plyr 1.8.6 2020-03-03 [1] CRAN (R 3.6.3) #> 1 png 0.1-7 2013-12-03 [1] CRAN (R 3.6.0) #> 1 purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.3) #> 1 R.cache 0.15.0 2021-04-30 [1] CRAN (R 3.6.3) #> 1 R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 3.6.3) #> 1 R.oo 1.24.0 2020-08-26 [1] CRAN (R 3.6.3) #> 1 R.utils 2.11.0 2021-09-26 [1] CRAN (R 3.6.2) #> 1 R6 2.5.1 2021-08-19 [1] CRAN (R 3.6.2) #> 1 rainbow 3.6 2019-01-29 [1] CRAN (R 3.6.3) #> 1 RBGL 1.62.1 2019-10-30 [1] Bioconductor #> 1 RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 3.6.0) #> 1 Rcpp 1.0.7 2021-07-07 [1] CRAN (R 3.6.2) #> 2 RcppParallel 5.1.4 2021-05-04 [1] CRAN (R 3.6.3) #> 1 RCurl 1.98-1.3 2021-03-16 [1] CRAN (R 3.6.3) #> 1 reprex * 2.0.1 2021-08-05 [1] CRAN (R 3.6.2) #> 1 Rgraphviz 2.30.0 2019-10-29 [1] Bioconductor #> 1 rlang 0.4.11 2021-04-30 [1] CRAN (R 3.6.3) #> 1 rmarkdown 2.11 2021-09-14 [1] CRAN (R 3.6.2) #> 1 robustbase 0.93-6 2020-03-23 [1] CRAN (R 3.6.3) #> 1 rrcov 1.5-5 2020-08-03 [1] CRAN (R 3.6.3) #> 1 rstudioapi 0.13 2020-11-12 [1] CRAN (R 3.6.3) #> 1 scales 1.1.1 2020-05-11 [1] CRAN (R 3.6.3) #> 1 sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 3.6.2) #> 1 stringi 1.6.1 2021-05-10 [1] CRAN (R 3.6.3) #> 1 stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.3) #> 1 styler 1.6.2 2021-09-23 [1] CRAN (R 3.6.2) #> 1 tibble * 3.1.6 2021-11-07 [1] CRAN (R 3.6.2) #> 1 tidyselect 1.1.1 2021-04-30 [1] CRAN (R 3.6.3) #> 1 tmvnsim 1.0-2 2016-12-15 [1] CRAN (R 3.6.0) #> 1 utf8 1.2.1 2021-03-12 [1] CRAN (R 3.6.3) #> 1 vctrs 0.3.8 2021-04-29 [1] CRAN (R 3.6.3) #> 1 withr 2.4.3 2021-11-30 [1] CRAN (R 3.6.2) #> 1 xfun 0.29 2021-12-14 [1] CRAN (R 3.6.2) #> 1 XML 3.99-0.3 2020-01-20 [1] CRAN (R 3.6.3) #> 1 yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.3) #> 1 zlibbioc 1.32.0 2019-10-29 [1] Bioconductor #> #> [1] C:/Users/Artemio.Sison/Documents/R/win-library/3.6 #> [2] C:/Program Files/R/R-3.6.2/library #> #> D -- DLL MD5 mismatch, broken installation. #> #> ------------------------------------------------------------------------------ ```

Created on 2022-02-04 by the reprex package (v2.0.1)

And then convert the proportion derived from FlowWorkspace/openCyto (which are really small decimals) to percent. Using the sigfig preference (3) in the freq.signif() function they are off by a significant amount, so I included another function freq.close() but the values seem to be off by 1/1000th or so:

Example files here (trimmed to look at only the values that are not matching, which happen to be fractions of a percent): manual_ex.csv cyto_ex.csv

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
man_ex <- read.csv("~/projects/opencyto/manual_ex.csv")
cyto_ex <- read.csv("~/projects/opencyto/cyto_ex.csv")

freq.signif <- function(x){signif(x*100, digits = 3)}
freq.close <- function(x){signif(signif(x*100, digits = 2), digits = 3)}

# manual table export from flowjo that i want to match
man_ex
#>     ID  STIM  TIME CD4.param1 CD4.param2 CD4.param3 CD8.param3 CD8.param4
#> 1 sub1 stim1 time1      0.350      0.460       0.28       0.21       1.46
#> 2 sub1 stim2 time1      0.380      0.450       0.36       0.24       1.54
#> 3 sub2 stim1 time1      0.160      0.260       0.45       0.17       2.23
#> 4 sub2 stim2 time1      0.077      0.073       0.66       0.34       2.37
# raw export from flowworkspace
cyto_ex
#>     ID  STIM  TIME   CD4.param1   CD4.param2  CD4.param3  CD8.param3 CD8.param4
#> 1 sub1 stim1 time1 0.0034525865 0.0045836062 0.002797786 0.002065787 0.01461942
#> 2 sub1 stim2 time1 0.0038029193 0.0044740227 0.003635143 0.002427531 0.01542196
#> 3 sub2 stim1 time1 0.0015723645 0.0025729601 0.004502680 0.001694915 0.02227603
#> 4 sub2 stim2 time1 0.0007735448 0.0007251982 0.006647650 0.003414634 0.02365854

# attempt to replicate using flowjo sigfigs (=3)
dplyr::mutate_if(cyto_ex, is.numeric, freq.signif)
#>     ID  STIM  TIME CD4.param1 CD4.param2 CD4.param3 CD8.param3 CD8.param4
#> 1 sub1 stim1 time1     0.3450     0.4580      0.280      0.207       1.46
#> 2 sub1 stim2 time1     0.3800     0.4470      0.364      0.243       1.54
#> 3 sub2 stim1 time1     0.1570     0.2570      0.450      0.169       2.23
#> 4 sub2 stim2 time1     0.0774     0.0725      0.665      0.341       2.37
# applying signif function thats close, CD8.param4 is off by 1/1000th
dplyr::mutate_if(cyto_ex, is.numeric, freq.close)
#>     ID  STIM  TIME CD4.param1 CD4.param2 CD4.param3 CD8.param3 CD8.param4
#> 1 sub1 stim1 time1      0.350      0.460       0.28       0.21        1.5
#> 2 sub1 stim2 time1      0.380      0.450       0.36       0.24        1.5
#> 3 sub2 stim1 time1      0.160      0.260       0.45       0.17        2.2
#> 4 sub2 stim2 time1      0.077      0.073       0.66       0.34        2.4

Created on 2022-02-04 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 3.6.2 (2019-12-12) #> os Windows 10 x64 (build 18363) #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/New_York #> date 2022-02-04 #> pandoc 2.14.0.3 @ C:/Program Files/RStudio/bin/pandoc/ (via rmarkdown) #> #> - Packages ------------------------------------------------------------------- #> package * version date (UTC) lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.3) #> backports 1.2.1 2020-12-09 [1] CRAN (R 3.6.3) #> cli 3.1.0 2021-10-27 [1] CRAN (R 3.6.2) #> crayon 1.4.2 2021-10-29 [1] CRAN (R 3.6.2) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 3.6.2) #> digest 0.6.27 2020-10-24 [1] CRAN (R 3.6.3) #> dplyr * 1.0.7 2021-06-18 [1] CRAN (R 3.6.2) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 3.6.3) #> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.3) #> fansi 0.4.2 2021-01-15 [1] CRAN (R 3.6.3) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 3.6.3) #> fs 1.5.0 2020-07-31 [1] CRAN (R 3.6.3) #> generics 0.1.1 2021-10-25 [1] CRAN (R 3.6.2) #> glue 1.4.2 2020-08-27 [1] CRAN (R 3.6.3) #> highr 0.9 2021-04-16 [1] CRAN (R 3.6.3) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 3.6.2) #> knitr 1.37 2021-12-16 [1] CRAN (R 3.6.2) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 3.6.2) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 3.6.3) #> pillar 1.6.4 2021-10-18 [1] CRAN (R 3.6.2) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.3) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 3.6.3) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 3.6.3) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 3.6.3) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 3.6.3) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 3.6.2) #> R6 2.5.1 2021-08-19 [1] CRAN (R 3.6.2) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 3.6.2) #> rlang 0.4.11 2021-04-30 [1] CRAN (R 3.6.3) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 3.6.2) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 3.6.3) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 3.6.2) #> stringi 1.6.1 2021-05-10 [1] CRAN (R 3.6.3) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.3) #> styler 1.6.2 2021-09-23 [1] CRAN (R 3.6.2) #> tibble 3.1.6 2021-11-07 [1] CRAN (R 3.6.2) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 3.6.3) #> utf8 1.2.1 2021-03-12 [1] CRAN (R 3.6.3) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 3.6.3) #> withr 2.4.3 2021-11-30 [1] CRAN (R 3.6.2) #> xfun 0.29 2021-12-14 [1] CRAN (R 3.6.2) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.3) #> #> [1] C:/Users/Artemio.Sison/Documents/R/win-library/3.6 #> [2] C:/Program Files/R/R-3.6.2/library #> #> ------------------------------------------------------------------------------ ```

After reaching out to FlowJo directly: image

Is this an issue of including (or not including) the last values of these really small decimals? Or could it be an artifact stemming from the parsing of the wsp/XML counts?

Unfortunately my organization will not be updating to R4.0 for awhile so I apologize for not having the cleanest reprex. I want to believe that the inability to calculate the same exact values as FlowJo is not because of the version. You should be able to recreate this easily by exporting a Freq. Table from FlowJo and comparing it to the FlowWorkspace exported Freq. Table and converting the proportion to a percent.

FlowJo Version: 10.5.3 FlowJo Engine: v4.00770 OS: Windows 10 Java Version: 1.8.0_161-b12 Build Number 10.5.3

mikejiang commented 2 years ago

In my opinion, this amount of difference is expected, since flowWorkspace does its own gating independently from flowjo based on that gates parsed from XML, you can verified the difference of cell count

gh_pop_compare_stats(gs[[1]])

If you want to have the exact same stats from flowjo, simply ask for flowjo stats from this API that is

gs_pop_get_count_fast(gs, statistic = "freq", format = "wide",xml = TRUE)