RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
29 stars 14 forks source link

gatingset_to_flowjo output workspace doesn't include all samples if they have the same $FIL keyword #122

Closed bradleyed closed 3 years ago

bradleyed commented 3 years ago

Describe the bug I want to use the CytoML and flowWorkspace packages to help with analysis of some old flow cytometry data. I am having an issue with the gatingset_to_flowjo function:

The fcs files associated with the input XML workspace (FJ version 9) were named generically by DIVA and data files from the same well location in different plates have the same name (though the files are in different sub-folders for each plate). The problem is that the output workspace file (.wsp) omits all but one of any identically named fcs files (files sharing the same "$FIL." keyword).

The files are easily differentiated in flowWorkspace thanks to the event count being appended to the $FIL keyword, but only one of the files ends up exported to the flowjo V10 workspace.

Since we are all at the hutch, I can send FCS files and the input xml directly to you if desired, but I cannot upload them to a public site.

I believe I am using versions of the package that correspond with Bioconductor 3.12. I intended to try this code using the development versions of the cytoverse package, but I am currently have some issues installing via "cytoverse::cytoverse_update(repo = "github)".
And I think that problem is related to Rtools, which I am also having issues installing. If this is an issue that would be solved using the development version, please let me know.

Thanks!

Brad

To Reproduce Steps to reproduce the behavior:

library(tidyverse)
library(flowWorkspace)
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#> 
#>   vignette("flowWorkspace-Introduction", "flowWorkspace")
library(CytoML)
home_dir <- "C:/Users/brade/Desktop/reprex_CytoML"
## modified the name of fcs files (added stimulation to end) to see if this had any effect.  It doesn't, as name is drawn from $FIL keyword
list.files(home_dir, pattern = "Specimen_001_A1_A01",full = F,recursive = T)
#> [1] "FCS_BCE/DMSO/Specimen_001_A1_A01_DMSO.fcs"
#> [2] "FCS_BCE/pp65/Specimen_001_A1_A01_pp65.fcs"
#> [3] "FCS_BCE/SEB/Specimen_001_A1_A01_SEB.fcs"
ws2 <- open_flowjo_xml(file = paste0(home_dir, "/20210121_2 Workspace.xml"))
ws2
#> File location:  C:/Users/brade/Desktop/reprex_CytoML/20210121_2 Workspace.xml 
#> 
#> Groups in Workspace
#>          Name Num.Samples
#> 1 All Samples           3
#> 2        test           3
gs2 <- flowjo_to_gatingset(ws2, name = "test", execute = T,leaf.bool = F,skip_faulty_gate = T)
gs2
#> A GatingSet with 3 samples
sampleNames(gs2)
#> [1] "Specimen_001_A1_A01.fcs_400367" "Specimen_001_A1_A01.fcs_348976"
#> [3] "Specimen_001_A1_A01.fcs_241564"
flowWorkspace::keyword(gs2,keyword = "$FIL")
#>                      $FIL
#> 1 Specimen_001_A1_A01.fcs
#> 2 Specimen_001_A1_A01.fcs
#> 3 Specimen_001_A1_A01.fcs
gatingset_to_flowjo(gs2, outFile = paste0(home_dir, "/reprex2.wsp" ))
#> Warning in gatingset_to_flowjo(gs2, outFile = paste0(home_dir, "/reprex2.wsp")):
#> docker image 'rglab/gs-to-flowjo:2.2' is built with different cytolib version of
#> from R package: 2.2.0 vs 2.2.1
#> Using docker image rglab/gs-to-flowjo:2.2 to write FlowJo workspace...

Created on 2021-01-22 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.3 (2020-10-10) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/Los_Angeles #> date 2021-01-22 #> #> - Packages ------------------------------------------------------------------- #> ! package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> aws.s3 0.3.21 2020-04-07 [1] CRAN (R 4.0.3) #> aws.signature 0.6.0 2020-06-01 [1] CRAN (R 4.0.3) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3) #> base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.0) #> Biobase 2.50.0 2020-10-27 [1] Bioconductor #> BiocGenerics 0.36.0 2020-10-27 [1] Bioconductor #> broom 0.7.3 2020-12-16 [1] CRAN (R 4.0.3) #> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3) #> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.0) #> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.3) #> colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.3) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> curl 4.3 2019-12-02 [1] CRAN (R 4.0.0) #> cytolib 2.2.1 2021-01-17 [1] Bioconductor #> CytoML * 2.2.1 2020-11-03 [1] Bioconductor #> data.table 1.13.6 2020-12-30 [1] CRAN (R 4.0.3) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3) #> dbplyr 2.0.0 2020-11-03 [1] CRAN (R 4.0.3) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0) #> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.3) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> dplyr * 1.0.3 2021-01-15 [1] CRAN (R 4.0.3) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3) #> flowCore 2.2.0 2020-10-27 [1] Bioconductor #> flowWorkspace * 4.2.0 2020-10-27 [1] Bioconductor #> forcats * 0.5.0 2020-03-01 [1] CRAN (R 4.0.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) #> ggcyto 1.18.0 2020-10-27 [1] Bioconductor #> ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.3) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.3) #> graph 1.68.0 2020-10-27 [1] Bioconductor #> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.0) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0) #> haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.3) #> hexbin 1.28.2 2021-01-08 [1] CRAN (R 4.0.3) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.3) #> htmltools 0.5.1 2021-01-12 [1] CRAN (R 4.0.3) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.3) #> jpeg 0.1-8.1 2019-10-24 [1] CRAN (R 4.0.0) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.3) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.3) #> latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.0.0) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) #> lubridate 1.7.9.2 2020-11-13 [1] CRAN (R 4.0.3) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> matrixStats 0.57.0 2020-09-25 [1] CRAN (R 4.0.3) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) #> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0) #> ncdfFlow 2.36.0 2020-10-27 [1] Bioconductor #> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.3) #> pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.3) #> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.0) #> png 0.1-7 2013-12-03 [1] CRAN (R 4.0.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) #> processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.3) #> ps 1.5.0 2020-12-05 [1] CRAN (R 4.0.3) #> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) #> RBGL 1.66.0 2020-10-27 [1] Bioconductor #> RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.0) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.3) #> D RcppParallel 5.0.2 2020-06-24 [1] CRAN (R 4.0.3) #> readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.3) #> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.0) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.3) #> reprex 0.3.0 2019-05-16 [1] CRAN (R 4.0.0) #> Rgraphviz 2.34.0 2020-10-27 [1] Bioconductor #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3) #> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.3) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3) #> RProtoBufLib 2.2.0 2020-10-27 [1] Bioconductor #> rvest 0.3.6 2020-07-25 [1] CRAN (R 4.0.3) #> S4Vectors 0.28.1 2020-12-09 [1] Bioconductor #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3) #> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> testthat 3.0.1 2020-12-17 [1] CRAN (R 4.0.3) #> tibble * 3.0.5 2021-01-15 [1] CRAN (R 4.0.3) #> tidyr * 1.1.2 2020-08-27 [1] CRAN (R 4.0.3) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0) #> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 4.0.0) #> usethis 2.0.0 2020-12-10 [1] CRAN (R 4.0.3) #> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.3) #> withr 2.4.0 2021-01-16 [1] CRAN (R 4.0.3) #> xfun 0.20 2021-01-06 [1] CRAN (R 4.0.3) #> XML 3.99-0.5 2020-07-23 [1] CRAN (R 4.0.3) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> zlibbioc 1.36.0 2020-10-28 [1] Bioconductor #> #> [1] C:/Users/brade/Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.3/library #> #> D -- DLL MD5 mismatch, broken installation. ```
  1. Please upload any data needed to reproduce the error. Can provide private OneDrive link if desired/needed. Expected behavior Produce FJ 10 workspace with 3 samples

Screenshots image

SessionInfo: see code section above

mikejiang commented 3 years ago

Thanks for reporting, I will see if I can reproduce it on my own.

mikejiang commented 3 years ago

Here is the example I created to mimic your use case

> sampleNames(gs)
[1] "CytoTrol_CytoTrol_1.fcs" "b.fcs"                  
> keyword(gs, "$FIL")
                     $FIL
1 CytoTrol_CytoTrol_1.fcs
2 CytoTrol_CytoTrol_1.fcs
> pData(gs)
                                           name
CytoTrol_CytoTrol_1.fcs CytoTrol_CytoTrol_1.fcs
b.fcs                   CytoTrol_CytoTrol_1.fcs

As shown, two samples share the same $FIL keywords (which also carried over to name column of pdata), but with their own unique sampleName (in your case, the filename + total number of cells). So It won't be a problem until exporting to flowJo.

image

where it uses DataSet/uri to search and load the fcs files. Currently gatingset_to_flowjo simply fill uri with name column, which assumes it is unique and correct relative file paths that can be recognized by flowjo.

Apparently, in your case, this assumption doesn't hold. Another alternative source of file path is FILENAME keyword, e.g.

> keyword(gs, "FILENAME")
                                                  FILENAME
1 /tmp/Rtmp2OQMuP/file3f795c60d8a2/CytoTrol_CytoTrol_1.fcs
2 /tmp/Rtmp2OQMuP/file3f795c60d8a2/b.fcs

But as shown, it's path may not be valid or synced for flowJo machine as you move around the gs between computers.so we don't have a reliable source to fill the uri fields other than sticking to the current name column, which is most robust and workable solution (when $FIL are unique)

Therefore for you , the workaround is the manually correct name column before exporting to flowJo in my case, it will be

pData(gs)[["name"]] <- c("CytoTrol_CytoTrol_1.fcs", "b.fcs")
>  pData(gs)
                                           name
CytoTrol_CytoTrol_1.fcs CytoTrol_CytoTrol_1.fcs
b.fcs                                     b.fcs
  gatingset_to_flowjo(gs, outFile)

flowJo then should be able to load up all samples image

They will appear as the same names though (since it is from $FIL keyword)

bradleyed commented 3 years ago

Hi Mike-- Thanks so much. This looks like a good solution. I ran into some issues trying to upgrade to the development versions of the packages (as I mentioned earlier), so I will need to revert back to the earlier versions of the packages that did update before I can try out what you've suggested. I will post here in a few days once I am able to do so. Thanks again!

bradleyed commented 3 years ago

Hi Mike,

Your suggestion works great, and it will be easy to add a unique name column to the code using the experiment name and the start time of acquisition. Thanks so much. I am so grateful for all the great flow software your group provides.


library(tidyverse)
library(flowWorkspace)
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#> 
#>   vignette("flowWorkspace-Introduction", "flowWorkspace")
library(CytoML)
home_dir <- "C:/Users/brade/Desktop/reprex_CytoML"
## modified the name of fcs files (added stimulation to end) to see if this had any effect.  It doesn't, as name is drawn from $FIL keyword
list.files(home_dir, pattern = "Specimen_001_A1_A01",full = F,recursive = T)
#> [1] "FCS_BCE/DMSO/Specimen_001_A1_A01_DMSO.fcs"
#> [2] "FCS_BCE/pp65/Specimen_001_A1_A01_pp65.fcs"
#> [3] "FCS_BCE/SEB/Specimen_001_A1_A01_SEB.fcs"
ws2 <- open_flowjo_xml(file = paste0(home_dir, "/20210121_2 Workspace.xml"))
ws2
#> File location:  C:/Users/brade/Desktop/reprex_CytoML/20210121_2 Workspace.xml 
#> 
#> Groups in Workspace
#>          Name Num.Samples
#> 1 All Samples           3
#> 2        test           3
gs2 <- flowjo_to_gatingset(ws2, name = "test", execute = T,leaf.bool = F,skip_faulty_gate = T)
gs2
#> A GatingSet with 3 samples
sampleNames(gs2)
#> [1] "Specimen_001_A1_A01.fcs_400367" "Specimen_001_A1_A01.fcs_348976"
#> [3] "Specimen_001_A1_A01.fcs_241564"
flowWorkspace::keyword(gs2,keyword = "$FIL")
#>                      $FIL
#> 1 Specimen_001_A1_A01.fcs
#> 2 Specimen_001_A1_A01.fcs
#> 3 Specimen_001_A1_A01.fcs

#current "name" column
pData(gs2)
#>                                                   name
#> Specimen_001_A1_A01.fcs_400367 Specimen_001_A1_A01.fcs
#> Specimen_001_A1_A01.fcs_348976 Specimen_001_A1_A01.fcs
#> Specimen_001_A1_A01.fcs_241564 Specimen_001_A1_A01.fcs

#replace the "name" column with something that will be unique
newName <- paste(pull(keyword(gs2, "EXPERIMENT NAME")),
                 pull(keyword(gs2,"$BTIM")),sep = "_")%>%
  print()
#> [1] "tsayers NK ICS 053113_10:55:32" "tsayers NK ICS 053113_10:29:28"
#> [3] "tsayers NK ICS 053113_10:03:37"

pData(gs2)[["name"]] <- newName
pData(gs2)
#>                                                          name
#> Specimen_001_A1_A01.fcs_400367 tsayers NK ICS 053113_10:55:32
#> Specimen_001_A1_A01.fcs_348976 tsayers NK ICS 053113_10:29:28
#> Specimen_001_A1_A01.fcs_241564 tsayers NK ICS 053113_10:03:37

gatingset_to_flowjo(gs2, outFile = paste0(home_dir, "/reprex2.wsp" ))
#> Warning in gatingset_to_flowjo(gs2, outFile = paste0(home_dir, "/reprex2.wsp")):
#> docker image 'rglab/gs-to-flowjo:2.2' is built with different cytolib version of
#> from R package: 2.2.0 vs 2.2.1
#> Using docker image rglab/gs-to-flowjo:2.2 to write FlowJo workspace...

Created on 2021-01-24 by the reprex package (v0.3.0)

Session info ``` r sessionInfo() #> R version 4.0.3 (2020-10-10) #> Platform: x86_64-w64-mingw32/x64 (64-bit) #> Running under: Windows 10 x64 (build 19042) #> #> Matrix products: default #> #> locale: #> [1] LC_COLLATE=English_United States.1252 #> [2] LC_CTYPE=English_United States.1252 #> [3] LC_MONETARY=English_United States.1252 #> [4] LC_NUMERIC=C #> [5] LC_TIME=English_United States.1252 #> #> attached base packages: #> [1] stats graphics grDevices utils datasets methods base #> #> other attached packages: #> [1] CytoML_2.2.1 flowWorkspace_4.2.0 forcats_0.5.0 #> [4] stringr_1.4.0 dplyr_1.0.3 purrr_0.3.4 #> [7] readr_1.4.0 tidyr_1.1.2 tibble_3.0.5 #> [10] ggplot2_3.3.3 tidyverse_1.3.0 #> #> loaded via a namespace (and not attached): #> [1] Biobase_2.50.0 httr_1.4.2 jsonlite_1.7.2 #> [4] modelr_0.1.8 RcppParallel_5.0.2 assertthat_0.2.1 #> [7] highr_0.8 stats4_4.0.3 latticeExtra_0.6-29 #> [10] RBGL_1.66.0 cellranger_1.1.0 yaml_2.2.1 #> [13] pillar_1.4.7 backports_1.2.0 lattice_0.20-41 #> [16] glue_1.4.2 digest_0.6.27 RColorBrewer_1.1-2 #> [19] rvest_0.3.6 colorspace_2.0-0 plyr_1.8.6 #> [22] ggcyto_1.18.0 htmltools_0.5.1 XML_3.99-0.5 #> [25] pkgconfig_2.0.3 broom_0.7.3 haven_2.3.1 #> [28] zlibbioc_1.36.0 flowCore_2.2.0 scales_1.1.1 #> [31] jpeg_0.1-8.1 aws.s3_0.3.21 generics_0.1.0 #> [34] ellipsis_0.3.1 withr_2.4.0 hexbin_1.28.2 #> [37] BiocGenerics_0.36.0 cli_2.2.0 magrittr_2.0.1 #> [40] crayon_1.3.4 readxl_1.3.1 evaluate_0.14 #> [43] fs_1.5.0 fansi_0.4.2 xml2_1.3.2 #> [46] graph_1.68.0 tools_4.0.3 data.table_1.13.6 #> [49] ncdfFlow_2.36.0 hms_1.0.0 lifecycle_0.2.0 #> [52] matrixStats_0.57.0 S4Vectors_0.28.1 munsell_0.5.0 #> [55] reprex_0.3.0 compiler_4.0.3 rlang_0.4.10 #> [58] grid_4.0.3 aws.signature_0.6.0 base64enc_0.1-3 #> [61] rmarkdown_2.6 cytolib_2.2.1 gtable_0.3.0 #> [64] DBI_1.1.1 curl_4.3 R6_2.5.0 #> [67] RProtoBufLib_2.2.0 gridExtra_2.3 lubridate_1.7.9.2 #> [70] knitr_1.30 Rgraphviz_2.34.0 stringi_1.5.3 #> [73] parallel_4.0.3 Rcpp_1.0.6 vctrs_0.3.6 #> [76] png_0.1-7 dbplyr_2.0.0 tidyselect_1.1.0 #> [79] xfun_0.20 ```

image

bradleyed commented 3 years ago

Thanks again!

mikejiang commented 3 years ago

Just curious, since you did not use the fcs filenames for name column, I would expect flowjo wouldn't be able to automatically load the files and you need to manually correct or search files in flowjo ui, no? That's what I observed from my testing example.

bradleyed commented 3 years ago

I did not have to reconnect. It found them right away. Happy to share the wsp file if that would help you understand. Might it have something to do with it coming from a FJ 9 workspace that had the location information? image

mikejiang commented 3 years ago

I see, likely flowjo look it up from multiple sources: first dataset/uri , then one of the keywords you've displayed if uri fails. Unfortunately, none of these keywords are standard, so I can't assume they will always contain the properly file name/locations when I output wsp file. Anyway, glad that at least we've found the workaround that is working for you case.