geom_stats() not working as expected?

sraorao commented 8 years ago

Experiment_001.zip Hi, I have asked a question at BioStars regarding this (https://www.biostars.org/p/213491/), so apologies for crossposting. I will update both posts if I get an answer.

I am using ggcyto package in R to plot flow cytometry data, and to display stats on the plot, I am using the geom_stats() function as shown here: https://www.bioconductor.org/packages/release/bioc/vignettes/ggcyto/inst/doc/ggcyto.GatingSet.html

But while the plot and gates look fine, geom_stats is not working and I get the following warning messages when I run ggcyto()...geom_stats()

Warning messages: 1: Removed 27 rows containing missing values (geom_hex). 2: Removed 2 rows containing missing values (geom_label). 3: Removed 2 rows containing missing values (geom_label). 4: Removed 2 rows containing missing values (geom_label). 5: Removed 2 rows containing missing values (geom_label).

This is the code, but it's not a working example as the data files (FCS files exported from BD FACSdiva) are not uploaded. EDIT: I added a few of the FCS files now as a .zip attachment at the top.

# libraries ----
library(flowCore)
library(ggplot2)
library(cowplot)
library(ggcyto)
library(openCyto)
library(flowWorkspace)

# load FCS files ----
files <- list.files(path = "Folder_001/Experiment_001/", pattern = "Specimen", full.names = TRUE)
expt1 <- read.flowSet(files = files, name.keyword = "$FIL", alter.names = TRUE)

# log-transform PI and Annexin V data ----
logTrans <- logTransform(transformationId="log10-transformation", logbase=10, r=1, d=1)
trans <- transformList(c("PI.A", "BV421.A"), logTrans)
expt1.trans <- transform(expt1, trans)

# clean up NA/NaN/Inf ----
for(n in 1:length(expt1.trans)){
  exprs(expt1.trans[[n]]) <- exprs(expt1.trans[[n]])[!rowSums(!is.finite(exprs(expt1.trans[[n]]))), ]
}

# Add gates (flowWorkspace) ----
gate <- rectangleGate(filterId="myRectGate", "FSC.A"=c(0, 255000), "SSC.A"=c(0, 75000))
qg <- quadGate(filterId="myQuadGate1", "PI.A"=log10(10000), "BV421.A"=log10(3000))
gated.gs <- GatingSet(expt1.trans)
add(gated.gs, gate, parent = "root", name = "rect", recompute = TRUE)
add(gated.gs, qg, parent = "rect", name = c("Early apoptotic", "Dead", "PI only", "Live"))
getNodes(gated.gs)
recompute(gated.gs)
gated.selected <- gated.gs[c(8, 6)]

# plot ----
p <- ggcyto(gated.selected, aes(x = PI.A, y = BV421.A)) +
  geom_hex(bins = 200) +
  geom_gate(gates) + 
  labs(x = "PI", y = "Annexin V") + 
  geom_stats()

Looking at the ggplot object created by ggcyto, it looks like the problem might be with the x and y columns in the labels data frames, which have NA's, but manually modifying these columns does not seem to have any effect.

> p.build$data[[7]]
   x  y label PANEL group colour  fill size angle hjust vjust alpha family fontface lineheight
1 NA NA 9.16%     1    -1  black white 3.88     0   0.5   0.5    NA               1        1.2
2 NA NA 8.88%     2    -1  black white 3.88     0   0.5   0.5    NA               1        1.2

Hope that is all the relevant information, but please let me know if I missed anything. Thanks.

mikejiang commented 8 years ago

Can you save and post the subset of your GatingSet for trouble shooting? i.e.

save_gs(gated.gs[8], path = "mypath")

sraorao commented 8 years ago

subset.gs.zip

Attached, thanks.

mikejiang commented 8 years ago

Your range slot in the flowSet was screwed up by logTransform (even though you've manually cleaned up these NaN values in exprs data)

flowFrame object 'H2O2-ann'
with 9778 cells and 5 observables:
       name desc  range minRange     maxRange
$P1   FSC.A <NA> 262144        0 2.621430e+05
$P2   SSC.A <NA> 262144        0 2.621430e+05
$P3    PI.A <NA> 262144      NaN 5.418538e+00
$P4 BV421.A <NA> 262144      NaN 5.418538e+00
$P5    Time <NA> 262144        0 2.621430e+05

So you should really use logicleTransformation instead of log.

fs <- read.ncdfFlowSet(files = files)
trans <- estimateLogicle(fs[[1]], c("PI.A", "BV421.A"))

Note that

estimateLogicle automatically computes the parameter for logicle transformation and it already returns a transformList so you don't need to manually construct one.
you should use ncdfFlowSet instead of flowset because the latter is less efficient in both memory and cpu

sraorao commented 8 years ago

Ah I see, I need to use logicle transformation because I have negative values. It's working now, many thanks for taking the time to troubleshoot!

RGLab / ggcyto

geom_stats() not working as expected? #13