RGLab / flowCore

Core flow cytometry infrastructure
44 stars 25 forks source link

Error when using read.FCS with which.lines argument and using concatFCS #171

Closed MikedeKokkie closed 4 years ago

MikedeKokkie commented 4 years ago

Greetings,

For my project, I am trying to take an equal amount of cells from a list of FCS files, and save it into a new FCS file.

To read and subset the FCS files most efficiently, I am using the read.FCS function from the flowCore package with its arguments, "which.lines", which creates a subsample of random events from the FCS file. It was then my intention to concatenate these subsampled FCS files using CATALYST's "concatFCS" function, to both write this FCS file back to disk and use it for further analysis (e.g. clustering). However, when I tried this approach with my dataset for the first time, I got an error. To make sure this was not related to the dataset, I reproduced this error with an example data set from CATALYST, and it turned into quite a simple reproduction:

require(flowCore)
#> Loading required package: flowCore
require(CATALYST)
#> Loading required package: CATALYST

data(raw_data)
write.FCS(x = raw_data[[1]], filename = "testdataset.fcs")
#> [1] "testdataset.fcs"

test <- list()
test[[1]] <- read.FCS(filename = "testdataset.fcs", which.lines = 100)
test[[2]] <- read.FCS(filename = "testdataset.fcs", which.lines = 100)

concatFCS(test, file_num = TRUE)
#> Error in es[inds, timeCol]: subscript out of bounds

Sadly I am not experienced enough with FCS objects to dig deep enough to see where the issue lies exactly, but from what I know there to be a compatibility problem here between the two functions, despite the intermediate objects being of the same format (flowFrame). I'm wondering if either of you package developers can identify or even fix this issue. Alternatively, in the event that I'm doing things completely wrong, I would be happy to accept suggestions on how to do these steps.

Thank you both for your time and effort, I really appreciate these packages and how much they have contributed to the data analysis of flow cytometry.

Mike

MikedeKokkie commented 4 years ago

> sessionInfo()

R version 3.6.2 (2019-12-12) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 17763)

Matrix products: default

Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding

locale: [1] LC_COLLATE=Dutch_Netherlands.1252 LC_CTYPE=Dutch_Netherlands.1252 LC_MONETARY=Dutch_Netherlands.1252 [4] LC_NUMERIC=C LC_TIME=Dutch_Netherlands.1252

attached base packages: [1] splines stats4 grid parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] zlibbioc_1.32.0 zip_2.0.4 zeallot_0.1.0 yaml_2.2.0
[5] XVector_0.26.0 xts_0.11-2 xtable_1.8-4 xopen_1.0.0
[9] xfun_0.12 withr_2.1.2 whisker_0.4 viridis_0.5.1
[13] viridisLite_0.3.0 vipor_0.4.5 VIM_4.8.0 VGAM_1.1-1
[17] vegan_2.5-6 vctrs_0.2.1 vcd_1.4-4 uwot_0.1.5
[21] utf8_1.1.4 umap_0.2.4.1 tsne_0.1-3 tinytex_0.19
[25] tidyselect_0.2.5 tidyr_1.0.0 tibble_2.1.3 testthat_2.3.1
[29] sys_3.3 stringi_1.4.4 statmod_1.4.33 splancs_2.01-40
[33] sourcetools_0.1.7 snow_0.4-3 smoother_1.1 TTR_0.23-5
[37] sitmo_2.0.1 shinyjs_1.1 shinyFiles_0.7.3 shinydashboard_0.7.1
[41] shinyBS_0.61 shiny_1.4.0 shape_1.4.4 sessioninfo_1.1.1
[45] selectr_0.4-2 scatterplot3d_0.3-41 scater_1.14.0 scales_1.1.0
[49] sandwich_2.5-1 Rwave_2.4-8 rvest_0.3.5 xml2_1.2.2
[53] rversions_2.0.1 RUnit_0.4.32 Rtsne_0.15 rsvd_1.0.2
[57] rstudioapi_0.10 RSpectra_0.16-0 RSEIS_3.9-0 rrcov_1.5-2
[61] RProtoBufLib_1.8.0 rprojroot_1.3-2 RPMG_2.2-3 Rphenograph_0.99.1
[65] roxygen2_7.0.2 robustbase_0.93-5 rmarkdown_2.1 rlang_0.4.2
[69] rjson_0.2.20 rio_0.5.16 Rhdf5lib_1.8.0 rgeos_0.5-2
[73] RFOC_3.4-6 rex_1.1.2 reshape2_1.4.3 remotes_2.1.0
[77] rematch_1.0.1 readxl_1.3.1 readr_1.3.1 RcppProgress_0.4.1
[81] RcppParallel_4.4.4 RcppHNSW_0.2.0 RcppEigen_0.3.3.7.0 RcppAnnoy_0.0.14
[85] Rcpp_1.0.3 RColorBrewer_1.1-2 rcmdcheck_1.3.3 RBGL_1.62.1
[89] RANN_2.6.1 ranger_0.11.2 Radviz_0.8.1 R6_2.4.1
[93] quantreg_5.54 SparseM_1.78 purrr_0.3.3 ps_1.3.0
[97] proxy_0.4-23 promises_1.1.0 progress_1.2.2 processx_3.4.1
[101] prettyunits_1.1.0 praise_1.0.0 png_0.1-7 plotrix_3.7-7
[105] plotly_4.9.1 plogr_0.2.0 pkgload_1.0.2 pkgconfig_2.0.3
[109] pkgbuild_1.0.6 pillar_1.4.3 pheatmap_1.0.12 permute_0.9-5
[113] pdist_1.2 pcaPP_1.9-73 pbkrtest_0.4-7 openxlsx_4.1.4
[117] openssl_1.4.1 nnls_1.4 nloptr_1.2.1 ncdfFlow_2.32.0
[121] RcppArmadillo_0.9.800.3.0 munsell_0.5.0 multcomp_1.4-12 TH.data_1.0-10
[125] survival_3.1-8 mvtnorm_1.0-11 miscTools_0.6-26 minqa_1.2.4
[129] miniUI_0.1.1.1 mime_0.8 memoise_1.1.0 MEM_0.1.1
[133] MBA_0.0-9 MatrixModels_0.4-1 markdown_1.1 maptools_0.9-9
[137] sp_1.3-2 magrittr_1.5 locfit_1.5-9.1 lmtest_0.9-37
[141] zoo_1.8-6 lme4_1.1-21 lifecycle_0.1.0 lazyeval_0.2.2
[145] latticeExtra_0.6-29 later_1.0.0 lambda.r_1.2.4 laeken_0.5.0
[149] labeling_0.3 kohonen_3.0.10 kableExtra_1.1.0 knitr_1.27
[153] jsonlite_1.6 irlba_2.3.3 Matrix_1.2-18 installr_0.22.0
[157] stringr_1.4.0 ini_0.3.1 IDPmisc_1.1.19 httr_1.4.1
[161] httpuv_1.5.2 htmlwidgets_1.5.1 htmltools_0.4.0 hms_0.5.3
[165] highr_0.8 hexbin_1.28.0 HDF5Array_1.14.1 rhdf5_2.30.1
[169] haven_2.2.0 gtools_3.8.1 gtable_0.3.0 gridExtra_2.3
[173] gridBase_0.4-7 gplots_3.0.1.2 glue_1.3.1 GlobalOptions_0.1.1
[177] git2r_0.26.1 gh_1.0.1 ggthemes_4.2.0 ggridges_0.5.2
[181] ggrepel_0.8.1 ggbeeswarm_0.6.0 GetoptLong_0.1.8 GEOmap_2.4-4
[185] GenomeInfoDbData_1.2.2 gdata_2.18.0 futile.options_1.0.1 futile.logger_1.4.3
[189] fs_1.3.1 formattable_0.2.0.1 formatR_1.7 forcats_0.4.0
[193] FNN_1.1.3 flowWorkspace_3.34.1 flowViz_1.50.0 lattice_0.20-38
[197] flowUtils_1.50.0 FlowSOM_1.18.0 igraph_1.2.4.2 flowDensity_1.20.0
[201] flowCL_1.24.0 SPARQL_1.16 RCurl_1.95-4.13 XML_3.98-1.20
[205] Rgraphviz_2.30.0 graph_1.64.0 fields_10.0 maps_3.3.0
[209] spam_2.5-1 fastmap_1.0.1 fansi_0.4.1 evaluate_0.14
[213] ellipsis_0.3.0 edgeR_3.28.0 limma_3.42.0 e1071_1.7-3
[217] DT_0.11 drc_3.0-1 MASS_7.3-51.4 dqrng_0.2.1
[221] dplyr_0.8.3 dotCall64_1.0-0 doParallel_1.0.15 iterators_1.0.12
[225] foreach_1.4.7 digest_0.6.23 devtools_2.2.1 usethis_1.5.1
[229] destiny_2.14.0 desc_1.2.0 DEoptimR_1.0-8 DelayedMatrixStats_1.8.0
[233] data.table_1.12.8 cytutils_0.1.0 cytolib_1.8.0 cytofkit_1.12.0
[237] plyr_1.8.5 ggplot2_3.2.1 cytofCore_0.4 flowCore_1.52.1
[241] cydar_1.10.0 SingleCellExperiment_1.8.0 SummarizedExperiment_1.16.1 DelayedArray_0.12.2
[245] matrixStats_0.55.0 GenomicRanges_1.38.0 GenomeInfoDb_1.22.0 IRanges_2.20.2
[249] S4Vectors_0.24.3 curl_4.3 crosstalk_1.0.0 crayon_1.3.4
[253] covr_3.4.0 corpcor_1.6.9 ConsensusClusterPlus_1.50.0 ComplexHeatmap_2.2.0
[257] commonmark_1.7 colorRamps_2.3 colourpicker_1.0 colorspace_1.4-1
[261] clue_0.3-57 clisymbols_1.2.0 clipr_0.7.0 cli_2.0.1
[265] circlize_0.4.8 cellranger_1.1.0 caTools_1.18.0 CATALYST_1.10.1
[269] car_3.0-6 carData_3.0-3 callr_3.4.0 brew_1.0-6
[273] bitops_1.0-6 BiocSingular_1.2.1 BiocParallel_1.20.1 BiocNeighbors_1.4.1
[277] BiocInstaller_1.30.0 Biobase_2.46.0 BiocGenerics_0.32.0 BH_1.72.0-3
[281] beeswarm_0.2.3 beachmat_2.2.1 base64enc_0.1-3 backports_1.1.5
[285] assertthat_0.2.1 askpass_1.1 abind_1.4-5

loaded via a namespace (and not attached): [1] tcltk_3.6.2 mgcv_1.8-31 flowStats_3.44.0 R.utils_2.9.2 rappdirs_0.3.1 jpeg_0.1-8.1
[7] KernSmooth_2.23-16 openCyto_1.24.0 mnormt_1.5-5 cowplot_1.0.0 reticulate_1.14 class_7.3-15
[13] webshot_0.5.2 ggcyto_1.14.0 nlme_3.1-142 ks_1.11.6 R.oo_1.23.0 R.methodsS3_1.7.1
[19] codetools_0.2-16 BiocManager_1.30.10 cluster_2.1.0 fda_2.4.8 reprex_0.3.0 mclust_5.4.5
[25] foreign_0.8-72 flowClust_3.24.0 ellipse_0.4.1 tools_3.6.2 boot_1.3-23 nnet_7.3-12
[31] CytoML_1.12.0 compiler_3.6.2

gfinak commented 4 years ago

concatFCS is a CAYALYST function. It's throwing the error. You should probably check with the CATALYST developers.

mikejiang commented 4 years ago

Alternatively, you can concatenate them without using CAYALYST::concatFCS

data("GvHD")
tmp <- tempfile()
write.FCS(GvHD[[1]], filename = tmp)

fr1 <- read.FCS(tmp, which.lines = 100)
fr2 <- read.FCS(tmp, which.lines = 100)

fs <- as(list(fr1,fr2), "flowSet")
fr <- as(fs, "flowFrame")
fr
flowFrame object 'anonymous'
with 200 cells and 9 observables:
             name              desc range minRange maxRange
$P1         FSC-H        FSC-Height  1024        0     1023
$P2         SSC-H        SSC-Height  1024        0     1023
$P3         FL1-H         CD15 FITC  1024        1    10000
...
SamGG commented 4 years ago

Hi, This question has recently been answered on the cytometry mailing of Perdue. Look for "Downsampling in R" in the archive http://www.cyto.purdue.edu/hmarchiv/index.htm I would recommend to try to reuse standard (as you tried by using CATALYST) because there could be misinterpretations of the FCS standard concerning some keywords if you start from scratch. So better reuse. Mike's answer is perfect as well as the recent answer #170 . As stated by David https://lists.purdue.edu/pipermail/cytometry/2020-January/054844.html and in flowCore documentation, I would not advised sampling during reading (ie the which.lines argument). If you want to process FCS files one by one, read the whole file, then sample it and append it to the flowset as proposed by Mike. This will be quick and robust. Best.

SamGG commented 4 years ago

The error points towards the time parameter. This parameter seems o be missing in your FCS files. Try to add by_time=FALSE to avoid using time channel. HTH

MikedeKokkie commented 4 years ago

concatFCS is a CAYALYST function. It's throwing the error. You should probably check with the CATALYST developers.

Yes, I did post this issue on CATALYST as well. Haven't received any replies there, but I'll be sure to refer them to the great answers here!

The error points towards the time parameter. This parameter seems o be missing in your FCS files. Try to add by_time=FALSE to avoid using time channel. HTH

Just tried this, sadly this option still leads to the same error message for me, even with CATALYST's "data(raw_data)" example data set. However...

Alternatively, you can concatenate them without using CAYALYST::concatFCS

fs <- as(list(fr1,fr2), "flowSet")
fr <- as(fs, "flowFrame")

These approaches work great! Thank you Mike and Sam!