RGLab / ggcyto

Visualize Cytometry data with ggplot2
MIT License
56 stars 13 forks source link

visualizing a polygonGate using geom_gate() and geom_polygon() is not consistent #40

Closed malisas closed 6 years ago

malisas commented 6 years ago

Hi, as shown below, visualizing a polygonGate using geom_gate() and geom_polygon() is not consistent. The geom_polygon() shape accurately reflects the shape I originally drew and intended, but geom_gate() does not.

While this is being investigated, my question is:
Do you think it is safe to trust the results of getPopStats(gs, subpopulations = "new_gate")?
Basically, I want to know if this is just a visualization error or if the gate membership is also calculated incorrectly. If it's just the visualization, I will go ahead and use getPopStats() and geom_polygon() for my analysis.

Thanks.

(I recently experienced another visualization issue, but I don't know if it's related)

> my_polygon_gate
Polygonal gate 'My_Diagonal' with 6 vertices in dimensions <R660-A> and <G610-A>
> my_polygon_gate@boundaries
     <R660-A> <G610-A>
[1,] 1788.514 2205.129
[2,] 3222.986 3554.983
[3,] 3407.841 3241.064
[4,] 1917.912 1958.478
[5,] 1773.726 1868.787
[6,] 1784.817 2205.129
> add(gs, my_polygon_gate, parent = "my_parent", name = "new_gate")
> recompute(gs, "my_parent")
# Some output not shown
> ggcyto(gs[1],
+        subset = "my_parent",
+        aes(x = "<R660-A>", y = "<G610-A>")) +
+   geom_hex(bins = 120) + geom_gate("my_parent/new_gate") +
+   geom_stats() +
+   facet_grid(TimePoint_v2 ~ Individual) +
+   xlim(c(1400,3800)) + ylim(c(1400, 4000)) +
+   geom_polygon(as.data.frame(my_polygon_gate@boundaries),
+                mapping = aes(x = `<R660-A>`, y = `<G610-A>`),
+                colour = "black", fill = NA)

rplot

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.3 LTS

Matrix products: default
BLAS: /usr/lib/libblas/libblas.so.3.6.0
LAPACK: /usr/lib/lapack/liblapack.so.3.6.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] hexbin_1.27.2             ggcyto_1.9.12             ggplot2_3.0.0             CytoGate_1.1.1            DescTools_0.99.25        
 [6] flowDensity_1.12.0        openCyto_1.19.2           flowWorkspace_3.29.10     ncdfFlow_2.25.5           BH_1.66.0-1              
[11] RcppArmadillo_0.9.100.5.0 flowCore_1.47.9           here_0.1                 

loaded via a namespace (and not attached):
  [1] readxl_1.1.0        backports_1.1.2     spam_2.2-0          drc_3.0-1           plyr_1.8.4          lazyeval_0.2.1     
  [7] sp_1.3-1            splines_3.4.4       RPMG_2.2-2          fda_2.4.8           TH.data_1.0-9       digest_0.6.18      
 [13] htmltools_0.3.6     gdata_2.18.0        magrittr_1.5        Rwave_2.4-8         cluster_2.0.7-1     ks_1.11.3          
 [19] openxlsx_4.1.0      matrixStats_0.54.0  R.utils_2.7.0       MCMCpack_1.4-4      sandwich_2.5-0      colorspace_1.3-2   
 [25] rrcov_1.4-4         haven_1.1.0         dplyr_0.7.7         crayon_1.3.4        jsonlite_1.5        graph_1.56.0       
 [31] bindr_0.1.1         survival_2.42-6     zoo_1.8-4           glue_1.3.0          flowClust_3.16.0    gtable_0.2.0       
 [37] zlibbioc_1.24.0     MatrixModels_0.4-1  car_3.0-2           IDPmisc_1.1.18      Rgraphviz_2.22.0    BiocGenerics_0.27.1
 [43] DEoptimR_1.0-8      maps_3.3.0          abind_1.4-5         SparseM_1.77        scales_1.0.0.9000   mvtnorm_1.0-8      
 [49] Rcpp_0.12.19        plotrix_3.7-4       viridisLite_0.3.0   clue_0.3-56         foreign_0.8-71      mclust_5.4.1       
 [55] RSEIS_3.8-3         dotCall64_1.0-0     stats4_3.4.4        htmlwidgets_1.3     httr_1.3.1          gplots_3.0.1       
 [61] RColorBrewer_1.1-2  pkgconfig_2.0.2     XML_3.98-1.16       R.methodsS3_1.7.1   manipulate_1.0.1    GEOmap_2.4-4       
 [67] flowViz_1.42.0      labeling_0.3        flowStats_3.36.0    tidyselect_0.2.5    rlang_0.3.0         reshape2_1.4.3     
 [73] munsell_0.5.0       cellranger_1.1.0    tools_3.4.4         splancs_2.01-40     stringr_1.3.1       yaml_2.2.0         
 [79] mcmc_0.9-5          zip_1.0.0           robustbase_0.93-3   caTools_1.17.1.1    purrr_0.2.5         CATALYST_1.2.0     
 [85] bindrcpp_0.2.2      RBGL_1.54.0         quantreg_5.36       R.oo_1.22.0         compiler_3.4.4      rstudioapi_0.8     
 [91] plotly_4.8.0        curl_3.2            tibble_1.4.2        pcaPP_1.9-73        stringi_1.2.4       forcats_0.3.0      
 [97] fields_9.6          rgeos_0.3-28        lattice_0.20-35     Matrix_1.2-14       pillar_1.3.0        data.table_1.11.8  
[103] bitops_1.0-6        corpcor_1.6.9       R6_2.3.0            latticeExtra_0.6-28 KernSmooth_2.23-15  gridExtra_2.3      
[109] RFOC_3.4-6          rio_0.5.10          codetools_0.2-15    boot_1.3-20         MASS_7.3-51         gtools_3.8.1       
[115] assertthat_0.2.0    MBA_0.0-9           rprojroot_1.3-2     withr_2.1.2         mnormt_1.5-5        multcomp_1.4-8     
[121] expm_0.999-3        grid_3.4.4          tidyr_0.8.1         coda_0.19-2         carData_3.0-2       Biobase_2.38.0 
mikejiang commented 6 years ago

a minimum reproducible and self-contained example (with data) would be helpful for troubleshoot

malisas commented 6 years ago

Hi Mike, here is a reprex. I will send you the GatingSet privately again.

In the meantime, do you have an opinion about whether it is safe to use getPopStats()?

library(openCyto)
#> Loading required package: flowWorkspace
#> Loading required package: flowCore
#> Loading required package: ncdfFlow
#> Loading required package: RcppArmadillo
#> Loading required package: BH
library(ggcyto)
#> Loading required package: ggplot2

gs <- load_gs("/path/to/20181023_GS_For_Reprex_small")
#> loading R object...
#> loading tree object...
#> Done
gs
#> A GatingSet with 1 samples
plot(gs, bool = T, fontsize = 15)
pnts <- matrix(c(1788.51407984939, 3222.9862729063, 3407.84093696002, 1917.912344687, 1773.7257067251, 1784.81698656832,
                 2205.12898496853, 3554.98324124303, 3241.06364676059, 1958.47787501804, 1868.78656230877, 2205.12898496853),
               ncol = 2)
colnames(pnts) <- c("<R660-A>", "<G610-A>")
my_polygon_gate <- polygonGate(.gate = pnts, filterId="My_Diagonal")

parent_pop <- "/Live CD3+/CD14-19-/Singlet/CCR7 Keeper/MR1 Keeper/Lymphocytes"

add(gs, my_polygon_gate, parent = parent_pop, name = "new_gate")
#> replicating filter 'My_Diagonal' across samples!
#> [1] 14
recompute(gs, parent_pop)
#> .
#> done!

ggcyto(gs,
       subset = parent_pop,
       aes(x = `<R660-A>`, y = `<G610-A>`)) +
  geom_hex(bins = 120) + geom_gate("new_gate") +
  geom_stats() +
  xlim(c(1400,3800)) + ylim(c(1400, 4000)) +
  geom_polygon(as.data.frame(my_polygon_gate@boundaries),
               mapping = aes(x = `<R660-A>`, y = `<G610-A>`),
               colour = "black", fill = NA)
#> Warning: Removed 49948 rows containing non-finite values (stat_binhex).
#> Warning: Removed 4 rows containing missing values (geom_hex).

Created on 2018-10-23 by the reprex package (v0.2.1)

malisas commented 6 years ago

Update: geom_overlay() seems to match the geom_polygon() events, so I feel like it is probably safe to use getPopStats().

rplot01

The location of the red geom_gate() gate changes depending on how I resize RStudio's plot window. Here's another version from when I resized the window a little.

rplot02

And from using the png() function instead of exporting manually:

rplot_png_func

I did try to reproduce this problem using some of the data from the flowWorkspaceData package, but the code worked correctly (i.e. no inconsistencies) on the T-cell and gs_bcell_auto datasets.

mikejiang commented 6 years ago

Yes, you are right. The gate and stats are computed correctly in GatingSet,it is the artifact from ggcyto which has to do with #5, basically it is the unsolved problem of plotting polygon with ggplot where it loses its shape as you start to setting limits on axis.
I need to think it over before coming up a better solution.

mikejiang commented 6 years ago

I've switched from scale to coord_cartesian limits , which won't clip the data as you zoom in by setting limits through coord_cartesian instead of the xlim/ylim directly. p + coord_cartesian(xlim = c(0,4e3), ylim = c(0,4e3)) image p + coord_cartesian(xlim = c(2.5e3,4e3), ylim = c(1e3,4e3)) image

I've tested it and it works pretty well except for some legacy quadrant gates, where the gate coordinates were extended to far right and thus exaggerates the space for hexbin. Even if limits are set through coord_cartesian , hexbin is stilled computed on the original large coord space. So we will have to figure out how to fix this before merging coord_cartesian branch to trunk

malisas commented 6 years ago

Thanks, Mike. Works for me.

I will use the coord_cartesian branch with ggplot2::coord_cartesian() (and not ggplot2::xlim() or ggplot2::ylim()) anytime I wish to use geom_gate() with a polygon gate.

mikejiang commented 6 years ago

@malisas , I've addressed geom_hex issue by 2974b47506b4ac9bd19715d95d752d863cddbe23 and coord_cartesian branch is now merged to trunk and will be hopefully applied to bioc release once it is stabilized.