RGLab / flowCore

Core flow cytometry infrastructure
43 stars 25 forks source link

How are events that fall right on the edges of a gate treated? #189

Open grafXzahl opened 4 years ago

grafXzahl commented 4 years ago

Hi all,

first of all I am new to the analysis of flow cytometry data, but after playing around with some data I recognised that some events which fall right on the edges of a gate are considered within the gate and others are not. So far, I was unable to figure out how the decision process of the flowCore gating algorithm works. In my naive understanding I would have figured that either all events on the gate edges or none of the events would be considered within oder without the gate but there seems to be a distinction.

I would be really grateful if somebody could explain this issue to me or point out the error in my approach.

I constructed the following two simple examples, in the hope that this will clarify the matter (the red dots are events which are considered to be within the borders of the gate and the black ones are not).


require("ggplot2")
require("flowCore")

plot_stuff <- function(testDat, splitDat, xy) {

  colrs <- rep("black", dim(testDat)[1])
  noOfInGateEvents <- dim(exprs(splitDat[[1]]))[1]
  if(noOfInGateEvents > 0) {
    pos <- NULL
    for(f in 1:noOfInGateEvents) {
      pos <- c(pos, 
               which(
                 exprs(testDat)[, 1] == exprs(splitDat[[1]])[f, 1] & 
                 exprs(testDat)[, 2] == exprs(splitDat[[1]])[f, 2]
               )
      )
    }
    colrs[pos] <- "red"
  }

  if(is.vector(xy)) {
    geom_gate <- annotate(geom = "rect", 
                          xmin = xy[1], xmax = xy[2], 
                          ymin = xy[3], ymax = xy[4], 
                          linetype = 1, colour = "red", fill = "red", alpha = 0.1)
  } else {
    geom_gate <- geom_polygon(data = data.frame(xy), linetype = 1, colour = "red", fill = "red", alpha = 0.1)
  }

  ggplot(data.frame(exprs(testDat)), aes_string(x = "FSC.H", y = "FSC.A")) +
    geom_point(size = 1, colour = colrs) +
    geom_gate +
    theme(legend.position = "none") +
    labs(x = "FSC.H", y = "FSC.A", title = names(splitDat)[1])
}

expValues <- matrix(c(rep(1:10, 10), unlist(lapply(1:10, rep, 10))), ncol = 2)
colnames(expValues) <- c("FSC.A", "FSC.H")

testDat <- new("flowFrame", exprs = expValues)

# example 1
x <- c(3, 7)
y <- c(4, 8)
gate <- rectangleGate("FSC.H" = x, "FSC.A" = y, filterId = "test1")
splitDat <- split(testDat, gate)
plot_stuff(testDat, splitDat, xy = c(x, y))


# example 2
xy <- matrix(c(3, 4, 7, 8, 3, 8), ncol = 2, byrow = TRUE)
colnames(xy) <- c("FSC.A", "FSC.H")
gate <- polygonGate(.gate = xy, filterId = "test2")
splitDat <- split(testDat, gate)
plot_stuff(testDat, splitDat, xy)

sessionInfo():

sessionInfo()
#> R version 3.6.2 (2019-12-12)
#> Platform: x86_64-apple-darwin15.6.0 (64-bit)
#> Running under: macOS Mojave 10.14.6
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] compiler_3.6.2  magrittr_1.5    tools_3.6.2     htmltools_0.4.0
#>  [5] yaml_2.2.1      Rcpp_1.0.4      stringi_1.4.6   rmarkdown_2.1  
#>  [9] highr_0.8       knitr_1.28      stringr_1.4.0   xfun_0.13      
#> [13] digest_0.6.25   rlang_0.4.6     evaluate_0.14

Thanks!

jacobpwagner commented 4 years ago

Thanks for the excellent reproducible example @grafXzahl. I'm looking in to it and just adding some notes here.

The behavior is slightly different for for the gates when applied to a GatingSet:

library(flowCore)
library(flowWorkspace)
library(ggcyto)

expValues <- matrix(c(rep(1:10, 10), unlist(lapply(1:10, rep, 10))), ncol = 2)
colnames(expValues) <- c("FSC.A", "FSC.H")
testDat <- new("flowFrame", exprs = expValues)
testDat <- flowFrame_to_cytoframe(testDat)

# example 1
x <- c(3, 7)
y <- c(4, 8)
gate1 <- rectangleGate("FSC.H" = x, "FSC.A" = y, filterId = "test1")

# example 2
xy <- matrix(c(3, 4, 7, 8, 3, 8), ncol = 2, byrow = TRUE)
colnames(xy) <- c("FSC.A", "FSC.H")
gate2 <- polygonGate(.gate = xy, filterId = "test2")

gs <-  GatingSet(cytoset(list(test_sample=testDat)))
gs_pop_add(gs, gate1, parent = "root")
gs_pop_add(gs, gate2, parent = "root")
recompute(gs)

autoplot(gs, "test1", bins = 0)
autoplot(gs, "test2", bins = 0)
gs_pop_get_stats(gs)

The output of gs_pop_get_stats:

        sample    pop count
1: test_sample   root   100
2: test_sample /test1    25
3: test_sample /test2    11

So you can see that the behavior for test1 here disagrees between flowCore split/filter and flowWorkspace gating for the rectangleGate because the former is excluding the upper bound on each channel while the latter is including it. I'll work on making those consistent as well as looking in to the polygonGate boundary inclusion issue.

jacobpwagner commented 4 years ago

This is also closely related to https://github.com/RGLab/flowCore/issues/86

jacobpwagner commented 4 years ago

https://github.com/RGLab/flowCore/commit/6739f46cdf976192325133e4191fd7f0d6aa2dad should solve the rectangleGate case (your test1) by making it match what was already the behavior for gates incorporated in a GatingSet. Still looking in to the polygonGate case.

jacobpwagner commented 4 years ago

Getting all of the points on the polygonGate boundary may come down to adding more tolerance here: https://github.com/RGLab/cytolib/blob/ee6bb4bfe10e90796367002d165ab9711fdf2463/src/in_polygon.cpp#L44 due to precision that is potentially lost here: https://github.com/RGLab/cytolib/blob/ee6bb4bfe10e90796367002d165ab9711fdf2463/src/in_polygon.cpp#L41, however 1) I still need to test that out 2) We would likely need to dynamically determine the tolerance magnitude due to the range of scales for each of the channels