RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

Issue with reproducibility using recompute() after gs_pop_add() #374

Open joliephan opened 2 years ago

joliephan commented 2 years ago

I am getting different cell counts every time I run the same script where I use recompute() after adding new gates using gs_pop_add() that are boolean combinations of preexisting gates. This has affected downstream analyses where I look for significant differences between subpopulation frequencies. How do I ensure that the gating is reproducible and consistent?

mikejiang commented 2 years ago

That behavior you described has not been observed in the past, can you provide an example code that can reproduce the issue?

joliephan commented 2 years ago

I think I figured out the source of this issue. The script is reproducible if I recompute with default settings using recompute(gs). However, it isn't reproducible if I specify another node name/path other than the default "root" (e.g., recompute(gs, "/Time/CD3+/CD14-CD19-/Lymphocytes/Singlet/Live/CD4+")).

Could you explain the difference between recomputing beginning at the "root" versus another node, assuming the newly added gates are descendants of the specified node? Thank you!

This is an example code that gives irreproducible counts under the new gates:

# Boolean combinations of existing gates
bool_gates <- c("CD4+/CD107a+&CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&CD4+/IL2+&!CD4+/IL4_5_13+&CD4+/TNFa+",
               "!CD4+/CD107a+&CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&CD4+/IL2+&CD4+/IL4_5_13+&CD4+/TNFa+",
               "CD4+/CD107a+&CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&!CD4+/IL2+&!CD4+/IL4_5_13+&CD4+/TNFa+",
               "!CD4+/CD107a+&CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&!CD4+/IL2+&!CD4+/IL4_5_13+&!CD4+/TNFa+",
               "!CD4+/CD107a+&!CD4+/CD154+&!CD4+/IFNg+&!CD4+/IL17a+&!CD4+/IL2+&!CD4+/IL4_5_13+&CD4+/TNFa+",
               "!CD4+/CD107a+&!CD4+/CD154+&CD4+/IFNg+&!CD4+/IL17a+&!CD4+/IL2+&!CD4+/IL4_5_13+&!CD4+/TNFa+")

names(bool_gates) <- paste0("CD4_", gsub("CD4\\+\\/", "", gsub("\\&", "_AND_", gsub("\\!", "NOT_", bool_gates))))

# Add new gates
for(booleanSubsetName in names(bool_gates)) {
  call <- substitute(flowWorkspace::booleanFilter(v), list(v = as.symbol(bool_gates[[booleanSubsetName]])))
  g <- eval(call)
  suppressWarnings(flowWorkspace::gs_pop_add(gs, g, parent = "CD4+", name=booleanSubsetName))
}

# Add a union gate of all the new gates
gs_pop_add(gs, eval(substitute(flowWorkspace::booleanFilter(v),
                               list(v = as.symbol(paste(names(bool_gates), collapse="|"))))),
           parent = "CD4+", name = "Boolean_Union")

# Recompute the gating set with the new gates
flowWorkspace::recompute(gs, "/Time/CD3+/CD14-CD19-/Lymphocytes/Singlet/Live/CD4+")
mikejiang commented 2 years ago

I tried to mimic your use case but cannot reproduce the issue you reported here is my sample

library(flowCore)
library(flowWorkspace)
dataDir <- system.file("extdata",package="flowWorkspaceData")
suppressMessages(gs <- load_gs(list.files(dataDir, pattern = "gs_manual",full = TRUE)))
getNodes(gs)
# Boolean combinations of existing gates
bool_gates <- c("CD4/38+ DR+&CD4/CCR7+ 45RA+",
                "!CD4/38+ DR+&CD4/CCR7+ 45RA+",
                "CD4/38+ DR+&!CD4/CCR7+ 45RA+")

names(bool_gates) <- paste0("CD4_", gsub("CD4\\/", "", gsub("\\&", "_AND_", gsub("\\!", "NOT_", bool_gates))))

# Add new gates
for(booleanSubsetName in names(bool_gates)) {
  call <- substitute(flowWorkspace::booleanFilter(v), list(v = as.symbol(bool_gates[[booleanSubsetName]])))
  g <- eval(call)
  suppressWarnings(flowWorkspace::gs_pop_add(gs, g, parent = "CD4", name=booleanSubsetName))
}
getNodes(gs)
# Add a union gate of all the new gates
gs_pop_add(gs, eval(substitute(flowWorkspace::booleanFilter(v),
                               list(v = as.symbol(paste(names(bool_gates), collapse="|"))))),
           parent = "CD4", name = "Boolean_Union")

# Recompute the gating set with the new gates
flowWorkspace::recompute(gs, "CD4")
gs_pop_get_stats(gs)[25:28,]
               sample                                                           pop count
1: CytoTrol_CytoTrol_1.fcs     /not debris/singlets/CD3+/CD4/CD4_38+ DR+_AND_CCR7+ 45RA+   445
2: CytoTrol_CytoTrol_1.fcs /not debris/singlets/CD3+/CD4/CD4_NOT_38+ DR+_AND_CCR7+ 45RA+ 11671
3: CytoTrol_CytoTrol_1.fcs /not debris/singlets/CD3+/CD4/CD4_38+ DR+_AND_NOT_CCR7+ 45RA+   599
4: CytoTrol_CytoTrol_1.fcs                   /not debris/singlets/CD3+/CD4/Boolean_Union 12715

I get the same counts when I run the script multiple times, can you give it a try? If it's possible, provide a similar minimal reproducible example for troubleshooting