RGLab / flowCore

Core flow cytometry infrastructure
43 stars 25 forks source link

flowCore >2.0 read.FCS results in $SPILLOVER error when reading certain FCS files (should it be a warning?) #262

Open okingfisher opened 9 months ago

okingfisher commented 9 months ago

Hi, We have been using flowCore for many years reading files from BD FACSCalibur, CyAn ADP, Beckman CytoFlex, Miltenyi MACSQuant Analyzer using read.FCS(), no problem up to version 2.0. However, we updated R and the packages and now flowCore 2.14 is returning an error reading the MACSQuant files. This error message (see below) is: $SPILLOVER keyword value is of improper size for number of spillover channels! Error: $SPILLOVER size discrepancy could not be resolved!

The culprit is apparently an apparently malformed $SPILLOVER in some of our files: $SPILLOVER/4,FL5-H,FL4-H,FL5-A,FL4-A (see linked file below); no problem with older files from that cytometer not containing $SPILLOVER or with files from the other cytometers.

Not quite sure what caused the cytometer software to include it in the FCS files. The funny thing is that we were working with flowCore 2.0, no problem. We updated R and packages in one of the computers and we started seeing this error. Replicated in my personal computer (Windows, OSX) which I updated recently but never tested with these files. We have another computer still using flowCore 2.0 and no problem reading the files.

Unfortunately, it seems that we cannot downgrade flowCore (maybe downgrading R, so BiocManager installs with a lower version and similarly for packages? not willing doing it). We have many older files with this problem (never noticed that), so just trying to solve it at the cytometer is not an option.

We tried with different parameters and even looking at the code, but no luck (it seems that it is in the C++ part, which is beyond our skills).

Therefore, I wonder if a subsequent version could incorporate a parameter for optionally converting this error into a warning (something like emptyValue = FALSE, which, incidentally, we need to read these files).

You can get the MACSQuant files (with and without the $SPILLOVER keyword) here: https://app.box.com/s/6bha57tzl0pnh71pq152457mlwsaidel

To Reproduce Steps to reproduce the behavior: please use the reprex package to build a reproducible example.

library(flowCore)
fcsdata <- read.FCS("~/FCS files/rbio2023-06-19.0011.fcs", emptyValue = FALSE,  truncate_max_range = FALSE)
#> $SPILLOVER keyword value is of improper size for number of spillover channels!
#> Error: $SPILLOVER size discrepancy could not be resolved!

Expected behavior Reading the malformed FCS files producing a warning (notice that read.FCS() from flowCore 2.0 did not produce such warning).

sessionInfo():

R version 4.3.2 (2023-10-31) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Ventura 13.6.3

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale: [1] es_ES.UTF-8/es_ES.UTF-8/es_ES.UTF-8/C/es_ES.UTF-8/es_ES.UTF-8

time zone: Europe/Madrid tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] flowCore_2.14.0

loaded via a namespace (and not attached): [1] compiler_4.3.2 RProtoBufLib_2.14.0 cytolib_2.14.0 tools_4.3.2 rstudioapi_0.15.0
[6] Biobase_2.62.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 matrixStats_1.2.0 stats4_4.3.2

Additional context The problem is in the acquisition template (cytometer down now due to a technical problem, solving as soon as possible). We know that we could use flowCore 2.0 to read these files, changing the header and writing back the file, but far from optimal and it might come back in the future.

SamGG commented 9 months ago

Hi,

@mikejiang $SPILLOVER is an optional keyword, so maybe an option to ignore all optional keywords might be useful.

@okingfisher Here is an external workaround. This pure R code invalidates the $SPILLOVER keyword by replacing $ with X. Verify that there is no other change to the file, especially the file length. The code handles file no bigger than 2GB.

# input & output file name
filen = "c:/demo/231213-spillover/rbio2023-06-19.0011.fcs"
filec = file.path(dirname(filen), gsub("\\.fcs$", "_xxx.fcs", basename(filen)))

# read HEADER and guess TEXT location
draw = readBin(filec, what = "raw", n = 26)
dhdr = rawToChar(draw)
first = strtoi(substr(dhdr, 11, 18))
last = strtoi(substr(dhdr, 19, 26))

# read the complete file
draw = readBin(filen, what = "raw", n = file.size(filen))
# search SPILLOVER in the TEXT part
dtxt = rawToChar(draw[first:last+1])
sep = substr(dtxt, 1, 1)
pos = regexpr(paste0(sep, "\\$SPILLOVER", sep), dtxt)
# if found, then invalidate the keyword
if (pos) {
  # rawToChar(draw[first+pos+1]) # check $ is found
  draw[first+pos+1] = charToRaw("X")
  writeBin(draw, filec)
}

# verify
flowCore::read.FCS(filec, emptyValue = FALSE, truncate_max_range = FALSE)
okingfisher commented 9 months ago

Thank you very much. I was thinking on such a workaround, and your example is very much appreciated (not much time to try things right now). I'll give it a try!