Closed bradleyed closed 3 years ago
Thanks for reporting, I will see if I can reproduce it on my own.
Here is the example I created to mimic your use case
> sampleNames(gs)
[1] "CytoTrol_CytoTrol_1.fcs" "b.fcs"
> keyword(gs, "$FIL")
$FIL
1 CytoTrol_CytoTrol_1.fcs
2 CytoTrol_CytoTrol_1.fcs
> pData(gs)
name
CytoTrol_CytoTrol_1.fcs CytoTrol_CytoTrol_1.fcs
b.fcs CytoTrol_CytoTrol_1.fcs
As shown, two samples share the same $FIL
keywords (which also carried over to name
column of pdata
), but with their own unique sampleName
(in your case, the filename + total number of cells). So It won't be a problem until exporting to flowJo.
where it uses DataSet/uri
to search and load the fcs files. Currently gatingset_to_flowjo
simply fill uri
with name
column, which assumes it is unique and correct relative file paths that can be recognized by flowjo.
Apparently, in your case, this assumption doesn't hold. Another alternative source of file path is FILENAME
keyword, e.g.
> keyword(gs, "FILENAME")
FILENAME
1 /tmp/Rtmp2OQMuP/file3f795c60d8a2/CytoTrol_CytoTrol_1.fcs
2 /tmp/Rtmp2OQMuP/file3f795c60d8a2/b.fcs
But as shown, it's path may not be valid or synced for flowJo
machine as you move around the gs
between computers.so we don't have a reliable source to fill the uri
fields other than sticking to the current name
column, which is most robust and workable solution (when $FIL are unique)
Therefore for you , the workaround is the manually correct name
column before exporting to flowJo
in my case, it will be
pData(gs)[["name"]] <- c("CytoTrol_CytoTrol_1.fcs", "b.fcs")
> pData(gs)
name
CytoTrol_CytoTrol_1.fcs CytoTrol_CytoTrol_1.fcs
b.fcs b.fcs
gatingset_to_flowjo(gs, outFile)
flowJo then should be able to load up all samples
They will appear as the same names though (since it is from $FIL keyword)
Hi Mike-- Thanks so much. This looks like a good solution. I ran into some issues trying to upgrade to the development versions of the packages (as I mentioned earlier), so I will need to revert back to the earlier versions of the packages that did update before I can try out what you've suggested. I will post here in a few days once I am able to do so. Thanks again!
Hi Mike,
Your suggestion works great, and it will be easy to add a unique name column to the code using the experiment name and the start time of acquisition. Thanks so much. I am so grateful for all the great flow software your group provides.
library(tidyverse)
library(flowWorkspace)
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#>
#> vignette("flowWorkspace-Introduction", "flowWorkspace")
library(CytoML)
home_dir <- "C:/Users/brade/Desktop/reprex_CytoML"
## modified the name of fcs files (added stimulation to end) to see if this had any effect. It doesn't, as name is drawn from $FIL keyword
list.files(home_dir, pattern = "Specimen_001_A1_A01",full = F,recursive = T)
#> [1] "FCS_BCE/DMSO/Specimen_001_A1_A01_DMSO.fcs"
#> [2] "FCS_BCE/pp65/Specimen_001_A1_A01_pp65.fcs"
#> [3] "FCS_BCE/SEB/Specimen_001_A1_A01_SEB.fcs"
ws2 <- open_flowjo_xml(file = paste0(home_dir, "/20210121_2 Workspace.xml"))
ws2
#> File location: C:/Users/brade/Desktop/reprex_CytoML/20210121_2 Workspace.xml
#>
#> Groups in Workspace
#> Name Num.Samples
#> 1 All Samples 3
#> 2 test 3
gs2 <- flowjo_to_gatingset(ws2, name = "test", execute = T,leaf.bool = F,skip_faulty_gate = T)
gs2
#> A GatingSet with 3 samples
sampleNames(gs2)
#> [1] "Specimen_001_A1_A01.fcs_400367" "Specimen_001_A1_A01.fcs_348976"
#> [3] "Specimen_001_A1_A01.fcs_241564"
flowWorkspace::keyword(gs2,keyword = "$FIL")
#> $FIL
#> 1 Specimen_001_A1_A01.fcs
#> 2 Specimen_001_A1_A01.fcs
#> 3 Specimen_001_A1_A01.fcs
#current "name" column
pData(gs2)
#> name
#> Specimen_001_A1_A01.fcs_400367 Specimen_001_A1_A01.fcs
#> Specimen_001_A1_A01.fcs_348976 Specimen_001_A1_A01.fcs
#> Specimen_001_A1_A01.fcs_241564 Specimen_001_A1_A01.fcs
#replace the "name" column with something that will be unique
newName <- paste(pull(keyword(gs2, "EXPERIMENT NAME")),
pull(keyword(gs2,"$BTIM")),sep = "_")%>%
print()
#> [1] "tsayers NK ICS 053113_10:55:32" "tsayers NK ICS 053113_10:29:28"
#> [3] "tsayers NK ICS 053113_10:03:37"
pData(gs2)[["name"]] <- newName
pData(gs2)
#> name
#> Specimen_001_A1_A01.fcs_400367 tsayers NK ICS 053113_10:55:32
#> Specimen_001_A1_A01.fcs_348976 tsayers NK ICS 053113_10:29:28
#> Specimen_001_A1_A01.fcs_241564 tsayers NK ICS 053113_10:03:37
gatingset_to_flowjo(gs2, outFile = paste0(home_dir, "/reprex2.wsp" ))
#> Warning in gatingset_to_flowjo(gs2, outFile = paste0(home_dir, "/reprex2.wsp")):
#> docker image 'rglab/gs-to-flowjo:2.2' is built with different cytolib version of
#> from R package: 2.2.0 vs 2.2.1
#> Using docker image rglab/gs-to-flowjo:2.2 to write FlowJo workspace...
Created on 2021-01-24 by the reprex package (v0.3.0)
Thanks again!
Just curious, since you did not use the fcs filenames for name
column, I would expect flowjo wouldn't be able to automatically load the files and you need to manually correct or search files in flowjo ui, no? That's what I observed from my testing example.
I did not have to reconnect. It found them right away. Happy to share the wsp file if that would help you understand. Might it have something to do with it coming from a FJ 9 workspace that had the location information?
I see, likely flowjo look it up from multiple sources: first dataset/uri
, then one of the keywords you've displayed if uri
fails.
Unfortunately, none of these keywords are standard, so I can't assume they will always contain the properly file name/locations when I output wsp file. Anyway, glad that at least we've found the workaround that is working for you case.
Describe the bug I want to use the CytoML and flowWorkspace packages to help with analysis of some old flow cytometry data. I am having an issue with the gatingset_to_flowjo function:
The fcs files associated with the input XML workspace (FJ version 9) were named generically by DIVA and data files from the same well location in different plates have the same name (though the files are in different sub-folders for each plate). The problem is that the output workspace file (.wsp) omits all but one of any identically named fcs files (files sharing the same "$FIL." keyword).
The files are easily differentiated in flowWorkspace thanks to the event count being appended to the $FIL keyword, but only one of the files ends up exported to the flowjo V10 workspace.
Since we are all at the hutch, I can send FCS files and the input xml directly to you if desired, but I cannot upload them to a public site.
I believe I am using versions of the package that correspond with Bioconductor 3.12. I intended to try this code using the development versions of the cytoverse package, but I am currently have some issues installing via "cytoverse::cytoverse_update(repo = "github)".
And I think that problem is related to Rtools, which I am also having issues installing. If this is an issue that would be solved using the development version, please let me know.
Thanks!
Brad
To Reproduce Steps to reproduce the behavior:
Created on 2021-01-22 by the reprex package (v0.3.0)
Session info
``` r devtools::session_info() #> - Session info --------------------------------------------------------------- #> setting value #> version R version 4.0.3 (2020-10-10) #> os Windows 10 x64 #> system x86_64, mingw32 #> ui RTerm #> language (EN) #> collate English_United States.1252 #> ctype English_United States.1252 #> tz America/Los_Angeles #> date 2021-01-22 #> #> - Packages ------------------------------------------------------------------- #> ! package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.0) #> aws.s3 0.3.21 2020-04-07 [1] CRAN (R 4.0.3) #> aws.signature 0.6.0 2020-06-01 [1] CRAN (R 4.0.3) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.0.3) #> base64enc 0.1-3 2015-07-28 [1] CRAN (R 4.0.0) #> Biobase 2.50.0 2020-10-27 [1] Bioconductor #> BiocGenerics 0.36.0 2020-10-27 [1] Bioconductor #> broom 0.7.3 2020-12-16 [1] CRAN (R 4.0.3) #> callr 3.5.1 2020-10-13 [1] CRAN (R 4.0.3) #> cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.0.0) #> cli 2.2.0 2020-11-20 [1] CRAN (R 4.0.3) #> colorspace 2.0-0 2020-11-11 [1] CRAN (R 4.0.3) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 4.0.0) #> curl 4.3 2019-12-02 [1] CRAN (R 4.0.0) #> cytolib 2.2.1 2021-01-17 [1] Bioconductor #> CytoML * 2.2.1 2020-11-03 [1] Bioconductor #> data.table 1.13.6 2020-12-30 [1] CRAN (R 4.0.3) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.0.3) #> dbplyr 2.0.0 2020-11-03 [1] CRAN (R 4.0.3) #> desc 1.2.0 2018-05-01 [1] CRAN (R 4.0.0) #> devtools 2.3.2 2020-09-18 [1] CRAN (R 4.0.3) #> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.3) #> dplyr * 1.0.3 2021-01-15 [1] CRAN (R 4.0.3) #> ellipsis 0.3.1 2020-05-15 [1] CRAN (R 4.0.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.0) #> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.3) #> flowCore 2.2.0 2020-10-27 [1] Bioconductor #> flowWorkspace * 4.2.0 2020-10-27 [1] Bioconductor #> forcats * 0.5.0 2020-03-01 [1] CRAN (R 4.0.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.3) #> generics 0.1.0 2020-10-31 [1] CRAN (R 4.0.3) #> ggcyto 1.18.0 2020-10-27 [1] Bioconductor #> ggplot2 * 3.3.3 2020-12-30 [1] CRAN (R 4.0.3) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.3) #> graph 1.68.0 2020-10-27 [1] Bioconductor #> gridExtra 2.3 2017-09-09 [1] CRAN (R 4.0.0) #> gtable 0.3.0 2019-03-25 [1] CRAN (R 4.0.0) #> haven 2.3.1 2020-06-01 [1] CRAN (R 4.0.3) #> hexbin 1.28.2 2021-01-08 [1] CRAN (R 4.0.3) #> highr 0.8 2019-03-20 [1] CRAN (R 4.0.0) #> hms 1.0.0 2021-01-13 [1] CRAN (R 4.0.3) #> htmltools 0.5.1 2021-01-12 [1] CRAN (R 4.0.3) #> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.3) #> jpeg 0.1-8.1 2019-10-24 [1] CRAN (R 4.0.0) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3) #> knitr 1.30 2020-09-22 [1] CRAN (R 4.0.3) #> lattice 0.20-41 2020-04-02 [2] CRAN (R 4.0.3) #> latticeExtra 0.6-29 2019-12-19 [1] CRAN (R 4.0.0) #> lifecycle 0.2.0 2020-03-06 [1] CRAN (R 4.0.0) #> lubridate 1.7.9.2 2020-11-13 [1] CRAN (R 4.0.3) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.3) #> matrixStats 0.57.0 2020-09-25 [1] CRAN (R 4.0.3) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 4.0.0) #> modelr 0.1.8 2020-05-19 [1] CRAN (R 4.0.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.0.0) #> ncdfFlow 2.36.0 2020-10-27 [1] Bioconductor #> pillar 1.4.7 2020-11-20 [1] CRAN (R 4.0.3) #> pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.3) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.0) #> pkgload 1.1.0 2020-05-29 [1] CRAN (R 4.0.3) #> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.0.0) #> png 0.1-7 2013-12-03 [1] CRAN (R 4.0.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.0) #> processx 3.4.5 2020-11-30 [1] CRAN (R 4.0.3) #> ps 1.5.0 2020-12-05 [1] CRAN (R 4.0.3) #> purrr * 0.3.4 2020-04-17 [1] CRAN (R 4.0.0) #> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.3) #> RBGL 1.66.0 2020-10-27 [1] Bioconductor #> RColorBrewer 1.1-2 2014-12-07 [1] CRAN (R 4.0.0) #> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.3) #> D RcppParallel 5.0.2 2020-06-24 [1] CRAN (R 4.0.3) #> readr * 1.4.0 2020-10-05 [1] CRAN (R 4.0.3) #> readxl 1.3.1 2019-03-13 [1] CRAN (R 4.0.0) #> remotes 2.2.0 2020-07-21 [1] CRAN (R 4.0.3) #> reprex 0.3.0 2019-05-16 [1] CRAN (R 4.0.0) #> Rgraphviz 2.34.0 2020-10-27 [1] Bioconductor #> rlang 0.4.10 2020-12-30 [1] CRAN (R 4.0.3) #> rmarkdown 2.6 2020-12-14 [1] CRAN (R 4.0.3) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.3) #> RProtoBufLib 2.2.0 2020-10-27 [1] Bioconductor #> rvest 0.3.6 2020-07-25 [1] CRAN (R 4.0.3) #> S4Vectors 0.28.1 2020-12-09 [1] Bioconductor #> scales 1.1.1 2020-05-11 [1] CRAN (R 4.0.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.0) #> stringi 1.5.3 2020-09-09 [1] CRAN (R 4.0.3) #> stringr * 1.4.0 2019-02-10 [1] CRAN (R 4.0.0) #> testthat 3.0.1 2020-12-17 [1] CRAN (R 4.0.3) #> tibble * 3.0.5 2021-01-15 [1] CRAN (R 4.0.3) #> tidyr * 1.1.2 2020-08-27 [1] CRAN (R 4.0.3) #> tidyselect 1.1.0 2020-05-11 [1] CRAN (R 4.0.0) #> tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 4.0.0) #> usethis 2.0.0 2020-12-10 [1] CRAN (R 4.0.3) #> vctrs 0.3.6 2020-12-17 [1] CRAN (R 4.0.3) #> withr 2.4.0 2021-01-16 [1] CRAN (R 4.0.3) #> xfun 0.20 2021-01-06 [1] CRAN (R 4.0.3) #> XML 3.99-0.5 2020-07-23 [1] CRAN (R 4.0.3) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.0) #> zlibbioc 1.36.0 2020-10-28 [1] Bioconductor #> #> [1] C:/Users/brade/Documents/R/win-library/4.0 #> [2] C:/Program Files/R/R-4.0.3/library #> #> D -- DLL MD5 mismatch, broken installation. ```Screenshots
SessionInfo: see code section above