RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
29 stars 14 forks source link

FlowJo CurlyQuad gates missing vertices from flowjo_to_gatingset with transform = FALSE #127

Closed jacobpwagner closed 3 years ago

jacobpwagner commented 3 years ago

Describe the bug Hey @mikejiang. I'm running in to a really particular issue with parsing CurlyQuadrant gates from FlowJo with transform = FALSE (which also necessitates execute = FALSE). Basically, it looks like somewhere along the way only the center vertex gets written out so .cpp_getGate only returns the single central vertex here (g$x and g$y each only have a single value), which causes this to fail with this error:

Error in validObject(.Object) : invalid class “polygonGate” object: 
slot 'boundaries' must be a numeric matrix of at least 3 rows and exactly 2 columns

I'll also keep trying to trace it backwards to the root problem and update if I find anything.

To Reproduce Steps to reproduce the behavior:

I believe this should be reproducible with any FlowJo xml with CurlyQuadrant gates on transformed scales. I don't have access to these internal test files, but I'm guessing this should reproduce it. You'd have to put an appropriate CurlyQuadrant population in the gh_pop_get_gate call (I can't glean what those would be from the test file)

library(flowWorkspace)
library(CytoML)
wsFile <- "~/rglab/workspace/CytoML/wsTestSuite/curlyQuad/example120151208_TBNK_DS.xml"
ws <- open_flowjo_xml(wsFile)

# For transform = TRUE, this works
gs <- flowjo_to_gatingset(ws, name = 2, execute = FALSE, transform = TRUE)
gh_pop_get_gate(gs[[1]], "<A CurlyQuad Pop>")

# For transform = FALSE, it doesn't. I thought it might be something in the extension logic, but
# pushing extend_val arbitrarily low also doesn't solve the problem
gs <- flowjo_to_gatingset(ws, name = 2, execute = FALSE, transform = FALSE)
# Specific quadrant doesn't actually matter. They all just get the single central vertex
gh_pop_get_gate(gs[[1]], "<A CurlyQuad Pop>")

That last gh_pop_get_gate call should yield the error. If that is not the case, I can probably attach a particular wsp for which I am seeing the error and update the reprex.

Additionally, it appears this is confined to the polygonGates resulting from CurlyQuadrant gates in the xml. Regular PolygonGates are coming through just fine, as are the other two quadrant types (regular and skewed).

Expected behavior No errors. I'd expect the CurlyQuadrant gates to be interpolated to polygons as usual, just on the untransformed scales.

SessionInfo:

> sessionInfo()
R version 4.0.4 (2021-02-15)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur 10.16

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] CytoML_2.3.3        flowWorkspace_4.3.7

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0             purrr_0.3.4                  lattice_0.20-41              colorspace_2.0-0             vctrs_0.3.6                 
 [6] generics_0.1.0               stats4_4.0.4                 yaml_2.2.1                   ncdfFlow_2.36.0              base64enc_0.1-3             
[11] utf8_1.1.4                   flowCore_2.3.2               XML_3.99-0.5                 RBGL_1.66.0                 
[16] rlang_0.4.10                 hexbin_1.28.2                pillar_1.5.1                 glue_1.4.2                   DBI_1.1.1                   
[21] aws.s3_0.3.21                Rgraphviz_2.34.0             BiocGenerics_0.36.0          RColorBrewer_1.1-2           plyr_1.8.6                  
[26] matrixStats_0.58.0           jpeg_0.1-8.1                 lifecycle_1.0.0              zlibbioc_1.36.0              RProtoBufLib_2.3.4          
[31] munsell_0.5.0                gtable_0.3.0                 cytolib_2.3.7                latticeExtra_0.6-29          Biobase_2.50.0              
[36] parallel_4.0.4               curl_4.3                     fansi_0.4.2                  Rcpp_1.0.6                   scales_1.1.1                
[41] S4Vectors_0.28.1             jsonlite_1.7.2               RcppParallel_5.0.3           graph_1.68.0                 gridExtra_2.3               
[46] ggplot2_3.3.3                png_0.1-7                    digest_0.6.27                dplyr_1.0.5                  grid_4.0.4                  
[51] tools_4.0.4                  magrittr_2.0.1               tibble_3.1.0                 crayon_1.4.1                 aws.signature_0.6.0         
[56] pkgconfig_2.0.3              ellipsis_0.3.1               data.table_1.14.0            xml2_1.3.2                   assertthat_0.2.1            
[61] httr_1.4.2                   R6_2.5.0                     ggcyto_1.19.1                compiler_4.0.4   
jacobpwagner commented 3 years ago

Ah. I think it's probably because gh->transform_gate is never called here and transform_gate is ultimately responsible for doing the interpolation of the quadrant curves in cytolib here. So I guess fixing this would require some decoupling.

jacobpwagner commented 3 years ago

Actually, this may be pretty straightforward. CurlyQuadGate::interpolate appears to just revert back to raw, then to 256 scale to do the interpolation anyway. So why not just do the interpolation starting from the raw scale as the gate is brought in?

For example, simply adding this:

trans_local dummyTrans = trans_local();
gate->interpolate(dummyTrans);

right before this this return appears to solve this issue for me. With transform = TRUE, the gates still appear appropriately interpolated on the transformed scales, but with transform = FALSE, they appear to be appropriately interpolated on the raw scales as well.

I suppose if that change were made, then the interpolation could be removed from GatingHierarchy::transform_Gate, as it would be already be done. The scales would just need to be set up appropriately for the transformation.

I could be missing something, though.

mikejiang commented 3 years ago

In fact the gate is stored as transformed scale in xml, and for non-linear scale, the transformation won't be available during gate parsing stage in order to properly rescale the gate to 256 space for the interpolation, which is why we had to deliberately delay the interpolation until transform_gate

The test case you are using may be the linear scale?

jacobpwagner commented 3 years ago

Ah. Yeah. Sorry, that makes sense. Alright. I'll find another workaround I suppose.