RGLab / CytoML

A GatingML Interface for Cross Platform Cytometry Data Sharing
GNU Affero General Public License v3.0
29 stars 14 forks source link

flowjo_to_gatingset does not accept data.frame as path #151

Closed nranthony closed 1 year ago

nranthony commented 1 year ago

Passing a data.frame to the path argument for flowjo_to_gatingset as detailed in the docs (pasted below) returns an error Error in path.expand(path) : invalid 'path' argument. This appears to be thrown on line 77 (whilst in debug, not sure in actual source) in creating the args list: path = suppressWarnings(normalizePath(path)) This expects a single path or character vector of paths, not a data.frame.

Description of path argument in docs: either a character scalar or data.frame. When character, it is a path to the fcs files that are to be imported. The code will search recursively, so you can point it to a location above the files. When it is a data.frame, it is expected to contain two columns:'sampleID' and 'file', which is used as the mapping between 'sampleID' and FCS file (absolute) path. When such mapping is provided, the file system searching is avoided.

SessionInfo: R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.utf8 LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8 LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8

time zone: America/New_York tzcode source: internal

attached base packages: [1] tools stats graphics grDevices datasets utils methods base

other attached packages: [1] CytoML_2.12.0 R6_2.5.1 tictoc_1.2 SamSPECTRAL_1.54.0
[5] ggpubr_0.6.0 gtools_3.9.4 PeacoQC_1.10.0 matrixStats_0.63.0
[9] flowSpecs_1.14.0 flowWorkspace_4.12.0 xml2_1.3.3 ggridges_0.5.4
[13] reshape2_1.4.4 lubridate_1.9.2 forcats_1.0.0 stringr_1.5.0
[17] dplyr_1.1.2 purrr_1.0.1 readr_2.1.4 tidyr_1.3.0
[21] tibble_3.2.1 ggplot2_3.4.2 tidyverse_2.0.0 flowCore_2.12.0

loaded via a namespace (and not attached): [1] tidyselect_1.2.0 XML_3.99-0.14 digest_0.6.31 timechange_0.2.0
[5] lifecycle_1.0.3 cluster_2.1.4 magrittr_2.0.3 compiler_4.3.0
[9] rlang_1.1.0 utf8_1.2.3 yaml_2.3.7 data.table_1.14.8
[13] ggsignif_0.6.4 plyr_1.8.8 RColorBrewer_1.1-3 abind_1.4-5
[17] BiocParallel_1.34.0 withr_2.5.0 RProtoBufLib_2.12.0 BiocGenerics_0.46.0
[21] grid_4.3.0 stats4_4.3.0 fansi_1.0.4 colorspace_2.1-0
[25] scales_1.2.1 iterators_1.0.14 cli_3.6.1 crayon_1.5.2
[29] ncdfFlow_2.46.0 generics_0.1.3 rstudioapi_0.14 tzdb_0.3.0
[33] rjson_0.2.21 zlibbioc_1.46.0 parallel_4.3.0 BiocManager_1.30.20
[37] vctrs_0.6.2 jsonlite_1.8.4 carData_3.0-5 cytolib_2.12.0
[41] car_3.1-2 IRanges_2.34.0 hms_1.1.3 GetoptLong_1.0.5
[45] S4Vectors_0.38.0 RBGL_1.76.0 rstatix_0.7.2 clue_0.3-64
[49] Rgraphviz_2.44.0 foreach_1.5.2 hexbin_1.28.3 glue_1.6.2
[53] codetools_0.2-19 stringi_1.7.12 gtable_0.3.3 shape_1.4.6
[57] ggcyto_1.28.0 ComplexHeatmap_2.16.0 munsell_0.5.0 pillar_1.9.0
[61] graph_1.78.0 circlize_0.4.15 doParallel_1.0.17 Biobase_2.60.0
[65] lattice_0.21-8 png_0.1-8 backports_1.4.1 broom_1.0.4
[69] renv_0.17.3 Rcpp_1.0.10 gridExtra_2.3 zoo_1.8-12
[73] pkgconfig_2.0.3 GlobalOptions_0.1.2

mikejiang commented 1 year ago

right, the feature of passing path as a data.frame was deprecated as we re-factored entire parsing code into c++, because it wasn't widely used functionality and we didn't think it was worthwhile to port it. I have updated documentation to reflect the current state of this parameter.

nranthony commented 1 year ago

Wonderful, thanks for the info and the update.

I have a situation where I need to explicitly define the fcs files, as there are duplicates of the same filename in two folder in the folder structure from the wsp. Without this functionality, what is the best way to open the wsp? Can I iterate over the fcs files one by one, or would you suggest something else?

Close-your-eyes commented 1 year ago

For my purposes, I do iterate over fcs file paths and import them separately as gating sets with CytoML::flowjo_to_gatingset.

When your files (or now gatingsets) are consistent and you want to continue with the gatingset format, you may use flowWorkspace::merge_list_to_gs() to merge a list gatingsets into one gatingset object.

Is it only the filenames that are duplicated or also the meta data of respective FCS files? If it is only the filenames but meta data are unique to each file then you may use the "subset" argument of CytoML::flowjo_to_gatingset to explicitly direct the function to the desired fcs file. The "path" argument may then be the root folder of your fcs files.

There may be a few more details to consider and it may still be a bit tricky though ...

nranthony commented 1 year ago

Thanks for the input, much appreciated. I'll run with your suggestions and figure it out. As such, no longer an issue. Closing.

mikejiang commented 1 year ago

type ?flowjo_to_gatingset and look for section on additional.keys parameter that can be used to address fcs file matching problem