RGLab / openCyto

A package that provides data analysis pipeline for flow cytometry.
GNU Affero General Public License v3.0
77 stars 29 forks source link

tail_gate function side argument alters the start, but not end, position of the gate #216

Closed Tariq-K closed 4 years ago

Tariq-K commented 4 years ago

Bug

Expected behaviour

Reproducible example

library(ggcyto)
library(flowCore)
library(openCyto)

file.name <- system.file("extdata","0877408774.B08",package="flowCore")
f <- read.FCS(file.name, transformation=FALSE)

tail_gate_position <- function(f, x, gate_func){

  g <- gate_func(f,
               channel='FSC-H',
               adjust=1.5,
               tol=0.001,
               auto_tol=FALSE,
               side=x)

  plus_and_minus <- split(f, f=g, drop='ANY')
  minus <- plus_and_minus$'-'

  p <- ggcyto(f, aes(x=`FSC-H`, y=`SSC-H`)) +
        geom_hex(bins = 128) +
        annotate('rect', xmin=g@min, xmax=g@max, ymin=-Inf, ymax=Inf,
                 colour='red', fill='red', alpha=0.2) +
        labs(title=paste('side = ', x)) +
        theme(strip.background = element_blank(),
              strip.text.x = element_blank() )

  return(p)
}

tail_gate_position(f, 'right', gate_tail)
tail_gate_position(f, 'left', gate_tail)

image image

Suggested fix

modified_gate_tail <- function(fr,
                           channel,
                           filterId = "",
                           num_peaks = 1,
                           ref_peak = 1,
                           strict = TRUE,
                           tol = 1e-2,
                           side = "left",
                           min = NULL,
                           max = NULL,
                           bias = 0, ...) {

    # this function is taken from the source code: https://rdrr.io/bioc/openCyto/src/R/gating-functions.R
    # and modified to gate correctly on negative populations

    if (!(is.null(min) && is.null(max))) {
        fr <- .truncate_flowframe(fr, channels = channel, min = min,
                                  max = max)
    }
    # cutpoint is calculated using the first derivative of the kernel density
    # estimate.
    x <- as.vector(exprs(fr)[, channel])
    cutpoint <- openCyto:::.cytokine_cutpoint(x = x, num_peaks = num_peaks,
                                   ref_peak = ref_peak, tol = tol, side = side, strict = strict, ...)

    cutpoint <- cutpoint + bias

    ### here is the fix ###
    # openCyto has the gate_end hardcoded as Inf
    if (side=='right'){
        gate_end <- Inf
    } else if (side=='left'){ # we can switch this to -Inf for left sided gates
        gate_end <- -Inf
    }
    gate_coordinates <- list(c(cutpoint, gate_end))
    ### end of fix ###

    names(gate_coordinates) <- channel
    rectangleGate(gate_coordinates, filterId = filterId)
}

tail_gate_position(f, 'left', modified_gate_tail)

image

Session info

R version 4.0.2 (2020-06-22)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.6

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] reprex_0.3.0              openCyto_2.0.0            ggcyto_1.16.0             flowWorkspace_4.0.6      
[5] ncdfFlow_2.34.0           BH_1.72.0-3               RcppArmadillo_0.9.900.2.0 flowCore_2.0.1           
[9] ggplot2_3.3.2            

loaded via a namespace (and not attached):
 [1] matrixStats_0.56.0  fs_1.4.2            RColorBrewer_1.1-2  Rgraphviz_2.32.0    tools_4.0.2        
 [6] R6_2.4.1            KernSmooth_2.23-17  BiocGenerics_0.34.0 colorspace_1.4-1    withr_2.2.0        
[11] tidyselect_1.1.0    gridExtra_2.3       mnormt_2.0.1        processx_3.4.3      compiler_4.0.2     
[16] graph_1.66.0        Biobase_2.48.0      flowClust_3.26.0    labeling_0.3        flowStats_4.0.0    
[21] scales_1.1.1        DEoptimR_1.0-8      hexbin_1.28.1       mvtnorm_1.1-1       robustbase_0.93-6  
[26] callr_3.4.3         RBGL_1.64.0         stringr_1.4.0       digest_0.6.25       rmarkdown_2.3      
[31] R.utils_2.9.2       rrcov_1.5-2         jpeg_0.1-8.1        pkgconfig_2.0.3     htmltools_0.5.0    
[36] rlang_0.4.7         rstudioapi_0.11     generics_0.0.2      farver_2.0.3        mclust_5.4.6       
[41] gtools_3.8.2        dplyr_1.0.1         R.oo_1.23.0         magrittr_1.5        RProtoBufLib_2.0.0 
[46] Matrix_1.2-18       Rcpp_1.0.5          munsell_0.5.0       clipr_0.7.0         lifecycle_0.2.0    
[51] R.methodsS3_1.8.0   stringi_1.4.6       whisker_0.4         yaml_2.2.1          MASS_7.3-51.6      
[56] zlibbioc_1.34.0     plyr_1.8.6          grid_4.0.2          parallel_4.0.2      crayon_1.3.4       
[61] lattice_0.20-41     splines_4.0.2       tmvnsim_1.0-2       knitr_1.29          ps_1.3.4           
[66] pillar_1.4.6        fda_5.1.4           corpcor_1.6.9       stats4_4.0.2        XML_3.99-0.4       
[71] glue_1.4.1          evaluate_0.14       latticeExtra_0.6-29 data.table_1.13.0   RcppParallel_5.0.2 
[76] png_0.1-7           vctrs_0.3.2         gtable_0.3.0        purrr_0.3.4         clue_0.3-57        
[81] ks_1.11.7           xfun_0.15           IDPmisc_1.1.20      pcaPP_1.9-73        tibble_3.0.3       
[86] cytolib_2.0.3       flowViz_1.52.0      ellipse_0.4.2       cluster_2.1.0       ellipsis_0.3.1  

Created on 2020-08-19 by the reprex package (v0.3.0)

mikejiang commented 4 years ago

The side argument is not intended to be interpreted that way (I do agree that it is a little confusing) Basically gate_tail always return a gate that capture the higher end of the 1d signal. The side argument is used to determine where the cutpoint is located

Here is the doc

Setting the side argument to "left" modifies the procedure to put the cutpoint on the left side of the reference peak to isolate a left tail.

par(mfrow = c(1,2))
g_right <- gate_tail(f, "FSC-H", adjust = 1.5, tol = 0.001, auto_tol = FALSE, side = "right", plot = T)
abline(v= g_right@min)
g_left <- gate_tail(f, "FSC-H", adjust = 1.5, tol = 0.001, auto_tol = FALSE, side = "left", plot = T)
abline(v= g_left@min)

image

mikejiang commented 4 years ago

This is generally the case for all other gating functions from openCyto , which has a different mechanism to retain the cells from the lower end, that is by setting pop = '-' argument in gs_add_gating_method call

f <- read.FCS(file.name, transformation=FALSE)

gs <- GatingSet(as(f, "flowSet"))
gs_add_gating_method(gs, "high", pop = "+", parent = "root", dims = "FSC-H", gating_method = "gate_tail")
autoplot(gs, "high", y = "SSC-H")

image

gs_add_gating_method(gs, "low", pop = "-", parent = "root", dims = "FSC-H", gating_method = "gate_tail")
autoplot(gs, "low", y = "SSC-H")

image

Even that case, the gate coordinates are still right hand side

g <- gs_pop_get_gate(gs, "low")[[1]]
g
Rectangular gate 'low' with dimensions:
  FSC-H: (475.423776291218,Inf)

It is through the negate flag to achieve the lower-end of cell population, i.e.

> gh_pop_is_negated(gs[[1]], "low")
[1] TRUE

So to conclude, you will need to create your own wrapper around gate_tail (which is very straightforward) in order to do what you described.

jacobpwagner commented 4 years ago

The weird truncation of the side arg's brief doc is now fixed. The behavior under side="left" seems awkward to me. Based on the intent of the method, I'd expect the positive/+ population under that definition to be (-Inf, cutpoint] (where cutpoint is to the left of the peak). I'll have to take a look at any downstream assumptions that might break, though.

Alternatively, if we need to keep the gate_tail definition as it is for the openCyto machinery, we could export an easy wrapper of .cytokine_cutpoint to do the heavy lifting of determining the start of the tail and let users define a rectangleGate putting the other bound wherever they want (it need not be infinite).

gfinak commented 4 years ago

I agree, @jacobpwagner . I don't see why it would have been otherwise. I don't see the use case for the current behaviour.

Tariq-K commented 4 years ago

Thanks all for your comments, and for updating the documentation. I'd certainly appreciate it if the functionality I described could be incorporated into gate_tail or .cytokine_cutpoint somehow, it's a use case that I often require and I'm certain that other users of openCyto would benefit!

jacobpwagner commented 4 years ago

@Tariq-K , this should be fixed by https://github.com/RGLab/openCyto/commit/54513238a21c58578c513ebb62a927dd33546f1e. I just had to make sure openCyto would handle it appropriately through openCyto::gs_add_gating_method, but it seems to be handled as expected by both openCyto:gs_add_gating_method and through directly adding the resulting gate via flowWorkspace::gs_pop_add.

@mikejiang , we still need to work through the refGate logic of handling a gate added with pop=="+" but bounds (-Inf, a]. https://github.com/RGLab/openCyto/blob/1e728cd31d71768c6a02ed4c32ec748110fa8db4/R/gating-methods.R#L592-L613

Using the current state of that logic, I think that will get flipped back to [a, Inf). I've gone ahead with the changes in https://github.com/RGLab/openCyto/commit/54513238a21c58578c513ebb62a927dd33546f1e because I doubt users were intentionally using the old incorrect behavior of gate_tail with side == "left".

As discussed offline, I believe the 1-D Inf-flipping logic comes from the requirements for the 1-D components of 2-D quad gates, but it does seem a bit problematic for natively 1-D gates.

jacobpwagner commented 4 years ago

Just adding a test script here for demonstration that it works to add the gate using either approach.

library(ggcyto)
library(flowCore)
library(openCyto)

# Right tail

gs1 <- load_gs(system.file("extdata", "gs_manual", package = "flowWorkspaceData"))
gs2 <- gs_clone(gs1)

# flowWorkspace::gs_pop_add approach
right_tail_gate <- gate_tail(gh_pop_get_data(gs1[[1]], "CD4"), channel = "<G560-A>", side = "right")
right_tail_gate@filterId <- "CCR7_right_tail"
right_tail_gate
gs_pop_add(gs1, right_tail_gate, parent = "CD4", name = "CCR7_right_tail_pos")
gs_pop_add(gs1, right_tail_gate, parent = "CD4", name = "CCR7_right_tail_neg", negated = TRUE)
recompute(gs1)
autoplot(gs1[[1]], "CCR7_right_tail_pos")
autoplot(gs1[[1]], "CCR7_right_tail_neg")
ggcyto(gh_pop_get_data(gs1, "CD4"), aes("CCR7")) +
  geom_density() +
  geom_gate(right_tail_gate)

# openCyto::gs_add_gating_method approach
openCyto::gs_add_gating_method(gs2, alias = "CCR7_right_tail_pos", pop = "+", parent = "CD4",
                               dims = "<G560-A>", gating_method = "gate_tail", 
                               gating_args = "side='right'")
openCyto::gs_add_gating_method(gs2, alias = "CCR7_right_tail_neg", pop = "-", parent = "CD4",
                               dims = "<G560-A>", gating_method = "gate_tail", 
                               gating_args = "side='right'")

autoplot(gs2[[1]], "CCR7_right_tail_pos")
autoplot(gs2[[1]], "CCR7_right_tail_neg")
ggcyto(gh_pop_get_data(gs2[[1]], "CD4"), aes("CCR7")) +
  geom_density() +
  geom_gate(gh_pop_get_gate(gs2[[1]], "CCR7_right_tail_pos"))
ggcyto(gh_pop_get_data(gs2[[1]], "CD4"), aes("CCR7")) +
  geom_density() +
  geom_gate(gh_pop_get_gate(gs2[[1]], "CCR7_right_tail_neg"))

# Verifying both approaches agree
gh_pop_get_stats(gs1[[1]], "CCR7_right_tail_pos")
gh_pop_get_stats(gs2[[1]], "CCR7_right_tail_pos")

gh_pop_get_stats(gs1[[1]], "CCR7_right_tail_neg")
gh_pop_get_stats(gs2[[1]], "CCR7_right_tail_neg")

# Left tail

gs1 <- load_gs(system.file("extdata", "gs_manual", package = "flowWorkspaceData"))
gs2 <- gs_clone(gs1)

# flowWorkspace::gs_pop_add approach
left_tail_gate <- gate_tail(gh_pop_get_data(gs1[[1]], "CD4"), channel = "<G560-A>", side = "left")
left_tail_gate@filterId <- "CCR7_left_tail"
left_tail_gate
gs_pop_add(gs1, left_tail_gate, parent = "CD4", name = "CCR7_left_tail_pos")
gs_pop_add(gs1, left_tail_gate, parent = "CD4", name = "CCR7_left_tail_neg", negated = TRUE)
recompute(gs1)
autoplot(gs1[[1]], "CCR7_left_tail_pos")
autoplot(gs1[[1]], "CCR7_left_tail_neg")
ggcyto(gh_pop_get_data(gs1, "CD4"), aes("CCR7")) +
  geom_density() +
  geom_gate(left_tail_gate)

# openCyto::gs_add_gating_method approach
openCyto::gs_add_gating_method(gs2, alias = "CCR7_left_tail_pos", pop = "+", parent = "CD4",
                               dims = "<G560-A>", gating_method = "gate_tail", 
                               gating_args = "side='left'")
openCyto::gs_add_gating_method(gs2, alias = "CCR7_left_tail_neg", pop = "-", parent = "CD4",
                               dims = "<G560-A>", gating_method = "gate_tail", 
                               gating_args = "side='left'")

autoplot(gs2[[1]], "CCR7_left_tail_pos")
autoplot(gs2[[1]], "CCR7_left_tail_neg")
ggcyto(gh_pop_get_data(gs2[[1]], "CD4"), aes("CCR7")) +
  geom_density() +
  geom_gate(gh_pop_get_gate(gs2[[1]], "CCR7_left_tail_pos"))
ggcyto(gh_pop_get_data(gs2[[1]], "CD4"), aes("CCR7")) +
  geom_density() +
  geom_gate(gh_pop_get_gate(gs2[[1]], "CCR7_left_tail_neg"))

# Verifying both approaches agree
gh_pop_get_stats(gs1[[1]], "CCR7_left_tail_pos")
gh_pop_get_stats(gs2[[1]], "CCR7_left_tail_pos")

gh_pop_get_stats(gs1[[1]], "CCR7_left_tail_neg")
gh_pop_get_stats(gs2[[1]], "CCR7_left_tail_neg")
Tariq-K commented 4 years ago

Brilliant, thanks very much @jacobpwagner

gfinak commented 4 years ago

@jacobpwagner I'm reopening this. I'm responsible for the confusion. I misread this issue and suggested a "fix" that introduced a bug. My apologies. Indeed, @mikejiang is correct in his assertion that gate_tail with arg side="left" should capture the positive population by default. Otherwise is breaks the openCyto templates as in #221. The question is, since gate_tail is being used in this way, should we consider passing in the "+" or "-" argument and have the directionality of the gate handled here rather than in the caller. Or, perhaps we introduce a helper function that can just flip the gate coordinates for those users that call gate_tail directly. I think we'll have to revert this change and update the docs.

jacobpwagner commented 4 years ago

Personally I think the better solution is making the 1-D inversion logic more robust as discussed in https://github.com/RGLab/openCyto/issues/221#issuecomment-715631937. That lets gate_tail have the expected behavior as a standalone function and makes the openCyto::refGate logic handle these issues better moving forward. But I'm happy to fix it either way.

alexheubeck commented 4 years ago

If it's any consolation, I think the original logic made sense in my head. The side argument places the gate on the appropriate side of the ref peak, while the pop argument decides if you want the population on the right or left side of that line included in the gate.

jacobpwagner commented 4 years ago

Yeah, we discussed it and that's effectively what we will be going back to, but with the option to add a positive = FALSE argument that lets you choose the (-Inf, cutpoint] option outside of the gatingTemplate context.

alexheubeck commented 4 years ago

That makes sense, thanks!

jacobpwagner commented 4 years ago

After https://github.com/RGLab/openCyto/commit/47687e6d42900bc0af1b964cb7e5b63d7e25ba81, the default behavior under side="left" is back to taking the positive orientation [cutpoint, Inf). However, there is now the option to add positive=FALSE to flip that to (-Inf, cutpoint]. That's most relevant if you truly need the gate orientation flipped if you plan on using the gate object for other purposes. If that's not necessary, you can simply use pop="-" in a gatingTemplate (or in gs_add_gating_method which goes through gatingTemplate) or negated = TRUE in flowWorkspace::gs_pop_add.

jacobpwagner commented 4 years ago

Re-closing after https://github.com/RGLab/openCyto/commit/47687e6d42900bc0af1b964cb7e5b63d7e25ba81