RGLab / flowCore

Core flow cytometry infrastructure
43 stars 25 forks source link

Error when reading FCS file #235

Open schraivo opened 2 years ago

schraivo commented 2 years ago

Describe the bug When trying to read an FCS file, read.FCS throws an error:

Error in if (any(idx)) { : missing value where TRUE/FALSE needed

I can not detect anything wrong with the file. FlowJo opens the file without issues. Already checked this issue, which looks similar, but couldn't reproduce the solution: https://github.com/RGLab/flowCore/issues/123

To Reproduce Download FCS file: https://oc.embl.de/index.php/s/F0rTI92aOkzofWc

library(flowCore)
read.FCS("pathtodirectory/sync_late_pS10_sampleA_100k.fcs")

Expected behavior There should be no error.

sessionInfo():

> sessionInfo()
R version 4.2.0 (2022-04-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS/LAPACK: /g/easybuild/x86_64/CentOS/7/haswell/software/FlexiBLAS/3.0.4-GCC-11.2.0/lib64/libflexiblas.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] BiocManager_1.30.17 BiocGenerics_0.42.0 forcats_0.5.1       stringr_1.4.0       dplyr_1.0.8         purrr_0.3.4         readr_2.1.2         tidyr_1.2.0        
 [9] tibble_3.1.6        ggplot2_3.3.5       tidyverse_1.3.1     flowCore_2.8.0     

loaded via a namespace (and not attached):
 [1] matrixStats_0.62.0 fs_1.5.2           usethis_2.1.5      lubridate_1.8.0    devtools_2.4.3     httr_1.4.2         rprojroot_2.0.3    tools_4.2.0       
 [9] backports_1.4.1    utf8_1.2.2         R6_2.5.1           DBI_1.1.2          colorspace_2.0-3   withr_2.5.0        tidyselect_1.1.2   prettyunits_1.1.1 
[17] processx_3.5.3     curl_4.3.2         compiler_4.2.0     cli_3.3.0          rvest_1.0.2        Biobase_2.56.0     xml2_1.3.3         desc_1.4.1        
[25] scales_1.2.0       callr_3.7.0        digest_0.6.29      rmarkdown_2.14     pkgconfig_2.0.3    htmltools_0.5.2    sessioninfo_1.2.2  dbplyr_2.1.1      
[33] fastmap_1.1.0      rlang_1.0.2        readxl_1.4.0       rstudioapi_0.13    generics_0.1.2     jsonlite_1.8.0     magrittr_2.0.3     RProtoBufLib_2.8.0
[41] Rcpp_1.0.8.3       munsell_0.5.0      S4Vectors_0.34.0   fansi_1.0.3        lifecycle_1.0.1    stringi_1.7.6      yaml_2.3.5         brio_1.1.3        
[49] pkgbuild_1.3.1     grid_4.2.0         crayon_1.5.1       haven_2.5.0        hms_1.1.1          knitr_1.38         ps_1.7.0           pillar_1.7.0      
[57] stats4_4.2.0       pkgload_1.2.4      reprex_2.0.1       glue_1.6.2         evaluate_0.15      remotes_2.4.2      RcppParallel_5.1.5 modelr_0.1.8      
[65] vctrs_0.4.1        tzdb_0.3.0         testthat_3.1.3     cellranger_1.1.0   gtable_0.3.0       assertthat_0.2.1   cachem_1.0.6       xfun_0.30         
[73] broom_0.8.0        cytolib_2.8.0      memoise_2.0.1      ellipsis_0.3.2
SamGG commented 2 years ago

flowCore is carrying out checks that FJ does not. The error seems to be located where read.FCS is checking whether the measurements are in their range or not (global min and channel max). This test seems to occur when there is no transformation set (but still unsure). As a rule, I always set the following when using read.FCS transformation = FALSE, emptyValue = TRUE, min.limit = NULL, truncate_max_range = FALSE. I think this could help while waiting for a better answer from Mike or Greg. https://github.com/RGLab/flowCore/blob/ba3b6ffed5310c1c0618487ab163c0142d8cab8f/R/IO.R#L1230-L1258

schraivo commented 2 years ago

Hi Sam, the error is gone with truncate_max_range = FALSE. Thanks! Would be curious to know which values were out of range, in order to understand the error. Let's see what Mike or Greg think.

gfinak commented 2 years ago

Your Time channel has a reported range of 9.223372e+18 but data in your time channel go up to 3.378476e+38 and some values are NA. We tend to keep things pretty strict about reading valid FCS files where the data make sense and leave it up to the user to knowingly shoot themselves in the foot. That said the messaging here could be much better, and this should be a warning not an error. The bug arises because we aren't treating the unexpected NA values properly when trying to truncate with truncate_max_range=TRUE. We'll get a fix in.

gfinak commented 2 years ago

Noting here until I can check out the repo:

Error is in this secton: any(idx) returns NA when idx contains NA. should likely be: any(na.omit(idx)) same for the section with min.limit

https://github.com/RGLab/flowCore/blob/14781e79b2774e30933a949664a91a25491ea503/R/IO.R#L1234-L1258

SamGG commented 2 years ago

For curiosity and if you have time, how a NA results from a FCS file?

gfinak commented 2 years ago

@SamGG a great question:

length(which(is.na(dat[,i])))
[1] 528

for that time channel. The data already contains NA after the readBin call parameters passed to that are:

readBin(con = con, what = "numeric", n = 17453527, size = 4, signed = TRUE, endian = "little")

And it's not just that it's trying to read more than it should... the length of the read data is 17453527 and the max index that's NA is 15418653. Probably another instrument maker writing bad files. If this happens with the time channel it's a wonder anybody can trust such data.

schraivo commented 2 years ago

It's an FCS file from an instrument prototype. Happy to continue discussing offline, as this will include confidential information. Thanks everyone for your support and quick responses.

SamGG commented 2 years ago

@gfinak bingo, you identified a new comer! @schraivo NA are precisely NaN; devel should take care when converting 64 bits integers; please close the issue and remove your link if you feel so.

gfinak commented 2 years ago

I still think we should test for NA and error more informatively. Will reopen until I push a fix.