RGLab / openCyto

A package that provides data analysis pipeline for flow cytometry.
GNU Affero General Public License v3.0
75 stars 29 forks source link

Inverted reference gate in gt_gating function #238

Closed htanaka1234 closed 2 years ago

htanaka1234 commented 2 years ago

I have discovered a different behavior than expected when using openCyto's gt_gating function in my newly built R environment. The code below completes without error, but the refGate method, which refers to a polygon gate or an ellipse gate, gets the population outside the gate. As a result, the middle and lower parts of the plot are expected to have the same result, but they do not.

library(dplyr)
library(data.table)
library(patchwork)
library(flowStats)
library(openCyto)
library(flowWorkspace)
BiocManager::valid()

getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details

replacement repositories: CRAN: http://cran.ism.ac.jp/

  • sessionInfo()

R version 4.2.1 (2022-06-23 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: [1] LC_COLLATE=Japanese_Japan.utf8 LC_CTYPE=Japanese_Japan.utf8
[3] LC_MONETARY=Japanese_Japan.utf8 LC_NUMERIC=C
[5] LC_TIME=Japanese_Japan.utf8

attached base packages: [1] stats graphics grDevices utils datasets methods
[7] base

other attached packages: [1] flowWorkspace_4.8.0 openCyto_2.8.0 flowStats_4.8.0
[4] patchwork_1.1.1 data.table_1.14.2 dplyr_1.0.9

loaded via a namespace (and not attached): [1] Biobase_2.56.0 httr_1.4.3 splines_4.2.1
[4] R.utils_2.12.0 gtools_3.9.2.2 rainbow_3.6
[7] RcppParallel_5.1.5 hdrcde_3.4 BiocManager_1.30.18 [10] stats4_4.2.1 latticeExtra_0.6-29 RBGL_1.72.0
[13] robustbase_0.95-0 pillar_1.7.0 lattice_0.20-45
[16] glue_1.6.2 digest_0.6.29 RColorBrewer_1.1-3 [19] colorspace_2.0-3 plyr_1.8.7 R.oo_1.25.0
[22] Matrix_1.4-1 pcaPP_2.0-1 XML_3.99-0.10
[25] pkgconfig_2.0.3 fda_6.0.4 zlibbioc_1.42.0
[28] purrr_0.3.4 flowCore_2.8.0 corpcor_1.6.10
[31] mvtnorm_1.1-3 scales_1.2.0 jpeg_0.1-9
[34] pracma_2.3.8 tibble_3.1.7 aws.s3_0.3.21
[37] generics_0.1.2 ggplot2_3.3.6 ellipsis_0.3.2
[40] flowViz_1.60.0 BiocGenerics_0.42.0 hexbin_1.28.2
[43] cli_3.3.0 mnormt_2.1.0 magrittr_2.0.3
[46] crayon_1.5.1 IDPmisc_1.1.20 mclust_5.4.10
[49] fds_1.8 R.methodsS3_1.8.2 ks_1.13.5
[52] fansi_1.0.3 MASS_7.3-57 xml2_1.3.3
[55] graph_1.74.0 tools_4.2.1 ncdfFlow_2.42.0
[58] flowClust_3.34.0 lifecycle_1.0.1 matrixStats_0.62.0 [61] S4Vectors_0.34.0 munsell_0.5.0 cluster_2.1.3
[64] compiler_4.2.1 rlang_1.0.2 grid_4.2.1
[67] RCurl_1.98-1.7 aws.signature_0.6.0 bitops_1.0-7
[70] base64enc_0.1-3 cytolib_2.8.0 gtable_0.3.0
[73] deSolve_1.32 curl_4.3.2 rrcov_1.7-0
[76] R6_2.5.1 RProtoBufLib_2.8.0 utf8_1.2.2
[79] clue_0.3-61 KernSmooth_2.23-20 Rgraphviz_2.40.0
[82] parallel_4.2.1 Rcpp_1.0.8.3 vctrs_0.4.1
[85] png_0.1-7 DEoptimR_1.0-11 tidyselect_1.1.2

Bioconductor version '3.15'

  • 3 packages out-of-date
  • 0 packages too new

create a valid installation with

BiocManager::install(c( "broom", "pkgload", "rlang" ), update = TRUE, ask = FALSE)

more details: BiocManager::valid()$too_new, BiocManager::valid()$out_of_date

Warning message: 3 packages out-of-date; 0 packages too new

dataDir <- system.file("extdata",package="flowWorkspaceData")
files <- list.files(dataDir, pattern = ".fcs$",full = TRUE)
cs <- load_cytoset_from_fcs(files[1], num_threads = 4)
fs <- cytoset_to_flowSet(cs)
gs <- GatingSet(fs)

gt.csv <- data.table(alias = c("boundary","singlet","singletRefGate","Lymph", "singletRefGate_LymphRefGate"),
    pop = "+",
    parent = c("root","boundary","boundary","singlet", "singletRefGate"),
    dims = c("FSC-A,SSC-A", "FSC-A,FSC-H", "FSC-A,FSC-H", "FSC-A,SSC-A", "FSC-A,SSC-A"),
    gating_method = c("boundary", "singletGate", "refGate", "flowClust", "refGate"),
    gating_args = c("min=c(0,0),max=c(2.5e5,2.5e5)", "", "singlet", "K=3,quantile=0.95,target=c(5e4,4e4)", "Lymph"),
    collapseDataForGating = "",
    groupBy = "",
    preprocessing_method = c("", "", "", "prior_flowClust", ""),
    preprocessing_args = c("", "", "", "K=3", ""))
gt.csv.path <- "testGatingTemplate.csv"
gt.csv %>% fwrite(gt.csv.path)
gt <- gatingTemplate(gt.csv.path)

gt_gating(gt, gs)

g1 <- as.ggplot(ggcyto(gs[[1]], subset = "boundary", aes(x = "FSC-A", y = "FSC-H")) + geom_hex(bins = 100) + geom_gate() + geom_stats())
g2 <- as.ggplot(ggcyto(gs[[1]], subset = "singlet", aes(x = "FSC-A", y = "FSC-H")) + geom_hex(bins = 100) + geom_gate() + geom_stats())
g3 <- as.ggplot(ggcyto(gs[[1]], subset = "singlet", aes(x = "FSC-A", y = "SSC-A")) + geom_hex(bins = 100) + geom_gate() + geom_stats())
g4 <- as.ggplot(ggcyto(gs[[1]], subset = "singletRefGate", aes(x = "FSC-A", y = "FSC-H")) + geom_hex(bins = 100))
g5 <- as.ggplot(ggcyto(gs[[1]], subset = "singletRefGate", aes(x = "FSC-A", y = "SSC-A")) + geom_hex(bins = 100) + geom_gate() + geom_stats())

g1 + plot_spacer() + g2 + g3 + g4 + g5 + plot_layout(ncol = 2)

Plot of reproduced result

This problem also occurs with bioconductor 3.15 with r-base 4.12(apt) and bioconductor 3.13 with r-base 4.10(source compile) on docker ubuntu:22.04 environment However, this does not happen with my old R environment.

sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000)

Matrix products: default

locale: [1] LC_COLLATE=Japanese_Japan.932 LC_CTYPE=Japanese_Japan.932
[3] LC_MONETARY=Japanese_Japan.932 LC_NUMERIC=C
[5] LC_TIME=Japanese_Japan.932

attached base packages: [1] stats graphics grDevices utils datasets methods
[7] base

loaded via a namespace (and not attached): [1] compiler_4.1.1 tools_4.1.1

tools:::.BioC_version_associated_with_R_version() [1] ‘3.13’

packageVersion("openCyto") [1] ‘2.2.0’

packageVersion("flowWorkspace") [1] ‘4.2.0’`

Extracting all the nodes and gates from the gating-set and reconstructing them yields the originally expected result.

gs_reconstruct <- function(gs, start = "root", end = NULL) {
    nodes.dt <- data.table(fullpath = flowWorkspace::gs_get_pop_paths(gs))
    nodes.dt[, ind := 1:.N - 1]
    nodes.dt <- nodes.dt[fullpath != "root"]
    nodes.dt[grep("^/", fullpath), fullpath := paste0("root", fullpath)]
    nodes.dt[, depth := fullpath %>% strsplit("/") %>% sapply(length)]
    nodes.dt[depth != min(depth), fullpath := fullpath %>% {gsub("^root", "", .)}]
    nodes.dt[, node := fullpath %>% strsplit("/") %>% sapply(last)] 
    nodes.dt[, parent := fullpath %>% strsplit("/") %>% sapply(function(x) x[-length(x)] %>% unlist %>% paste(collapse = "/"))]
    nodes.dt[, fullpath := fullpath %>% {gsub("^root", "", .)}]
    suppressWarnings (flist <- sapply(nodes.dt[, fullpath], function(x) flowWorkspace::gh_pop_get_gate(gs, x)) %>% filterList)

    startInd <- nodes.dt[parent == start, ind %>% min]
    if(is.null(end)) {
        endInd <- nodes.dt[, .N]
    } else {
        endInd <- nodes.dt[fullpath == end, ind]
    }

    nodesToRemove <- nodes.dt[startInd:.N, fullpath]
    while(length(nodesToRemove) > 0) {
        flowWorkspace::gs_pop_remove(gs, node = nodesToRemove[1])
        nodesToRemove <- nodesToRemove[nodesToRemove %chin% flowWorkspace::gs_get_pop_paths(gs)]
    }

    for(n in nodes.dt[startInd:endInd][order(ind), fullpath]) {
        flowWorkspace::gs_pop_add(gs, flist[[n]], parent = nodes.dt[fullpath == n, parent], name = nodes.dt[fullpath == n, node])
    }
    flowWorkspace::recompute(gs)
}
gs_reconstruct(gs)
g1 <- as.ggplot(ggcyto(gs[[1]], subset = "boundary", aes(x = "FSC-A", y = "FSC-H")) + geom_hex(bins = 100) + geom_gate() + geom_stats())
g2 <- as.ggplot(ggcyto(gs[[1]], subset = "singlet", aes(x = "FSC-A", y = "FSC-H")) + geom_hex(bins = 100) + geom_gate() + geom_stats())
g3 <- as.ggplot(ggcyto(gs[[1]], subset = "singlet", aes(x = "FSC-A", y = "SSC-A")) + geom_hex(bins = 100) + geom_gate() + geom_stats())
g4 <- as.ggplot(ggcyto(gs[[1]], subset = "singletRefGate", aes(x = "FSC-A", y = "FSC-H")) + geom_hex(bins = 100))
g5 <- as.ggplot(ggcyto(gs[[1]], subset = "singletRefGate", aes(x = "FSC-A", y = "SSC-A")) + geom_hex(bins = 100) + geom_gate() + geom_stats())

g1 + plot_spacer() + g2 + g3 + g4 + g5 + plot_layout(ncol = 2)

Plot of result after using gs_reconstruct user function

mikejiang commented 2 years ago

It is a bug introduced by https://github.com/RGLab/openCyto/issues/169#issuecomment-835661320 I have pushed the fix, let me know if it works for you

htanaka1234 commented 2 years ago

Thank you for your prompt response. I have updated to openCyto 2.8.1, but unfortunately there was no change in the results of running the above code.

mikejiang commented 2 years ago

the latest patch is in 2.8.2(release branch) and 2.9.2(development branch), bioconductor typically rebuilds development branch in one or two days, (takes longer for release branch). So if you want to get the updated package immediately, you can always install it from GitHub master branch

htanaka1234 commented 2 years ago

I have updated to openCyto 2.8.3 and confirmed that the problem has been resolved. Thank you very mutch.