Closed SamGG closed 3 years ago
First of all, this manually crafted (out of a matrix) flowFrame doesn't carry any keywords (especially $PnX
), so write.FCS
is trying to impute these values, including $PnR
.
And its value needs to be Integer (at least it is the case when file is generated from the instruments).
And write.FCS
computes the maximum data value and round it to ceiling, which you can see through keywords after you load it back
write.FCS(gg, tmp)
fr <- read.FCS(tmp)
> range(fr)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
min 0 0 0 0 0
max 7 4 6 2 2
> range(fr, "data")
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
min 4.3 2.0 1.0 0.1 1
max 7.9 4.4 6.9 2.5 3
> unlist(keyword(fr)[paste0("$P", 1:5, "R")])
$P1R $P2R $P3R $P4R $P5R
"8" "5" "7" "3" "3
So, PnR
isn't actually truncated. But maxRange
value(returned by range
or summary
) is computed from PnR
and it always gets subtracted by 1
, ( It is the legacy behavior and has been these way for a long time, and I don't know the exact reason, maybe has to do with the PnR definition of FCS standard?).
We can bump PnR
value by 1
during write.FCS
so that range(fr)
and range(fr, "data")
return the consistent range (without appearing to be truncated) if it bothers you.
Again this doesn't affect the FCS files that come from instruments or other softwares.
After the patch, here is the range info
> write.FCS(gg, tmp)
[1] "/tmp/RtmpKbY7ug/filec4335b7cea1"
> fr <- read.FCS(tmp)
> range(fr)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
min 0 0 0 0 0
max 8 5 7 3 3
> range(fr, "data")
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
min 4.3 2.0 1.0 0.1 1
max 7.9 4.4 6.9 2.5 3
> unlist(keyword(fr)[paste0("$P", 1:5, "R")])
$P1R $P2R $P3R $P4R $P5R
"9" "6" "8" "4" "4"
Thanks, Mike
Thanks Mike for fixing. In fact, I noticed this while explaining how to concatenate FCS. So, the faulty FCS comes out instruments in a sense. I should have given you a better example. It is below. I already seen some real problems linked to this behaviour one year ago. If the Original holds sample or cluster identifiers, truncation is a big problem. The last sample or cluster will be merged with the previous one. While reading FCS files is the most important point, every analytical pipeline should export its results, and it turns to be FCS files frequently, in order to allow experimenters to verify and/or tune the pipeline. So this is very important IMO. And it's even more important because most of flow cytometry packages tend to have their own/customized FCS writing functions that overloads the write.FCS function.
To recap the example below: a) warnings concerning GvHD b) NA range after as(), although summary is OK c) keywords are not read during concatenation (just to mention it) d) truncation is not taking place as I expect, but maybe it's a border effect: if maxRange is 34, I imagine that 35 should be truncated, but maybe it's 36 or 35.00001 that will be the first truncated values to 35 (?). Definitively those dual variables range and maxRange are a nightmare.
I think your fix is right, and it is important. I will keep in mind to be careful when flowFrame has no range keyword, as from matrix or concatenation. Check if everthing is correct in the new example, your fix has been not used yet. Thanks again.
library(flowCore)
data("GvHD")
#> Warning in load(zfile, envir = tmp_env): strings not representable in native
#> encoding will be translated to UTF-8
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
#> Warning in load(zfile, envir = tmp_env): input string 'CELLQuestª' cannot be
#> translated to UTF-8, is it valid in 'UTF-8' ?
warnings()[1:5]
#> NULL
(gg = as(GvHD, "flowFrame"))
#> flowFrame object 'anonymous'
#> with 585644 cells and 9 observables:
#> name desc range minRange maxRange
#> $P1 FSC-H FSC-Height 1024 0 1023
#> $P2 SSC-H SSC-Height 1024 0 1023
#> $P3 FL1-H CD15 FITC 1024 1 10000
#> $P4 FL2-H CD45 PE 1024 1 10000
#> $P5 FL3-H CD14 PerCP 1024 1 10000
#> $P6 FL2-A <NA> 1024 0 1023
#> $P7 FL4-H CD33 APC 1024 1 10000
#> $P8 Time Time (51.20 sec.) 1024 0 1023
#> Original Original Original Frame NA 1 35
#> 2 keywords are stored in the 'description' slot
summary(gg)
#> FSC-H SSC-H FL1-H FL2-H FL3-H FL2-A
#> Min. 52.0000 0.0000 1.000000 1.0000 1.000000 0.0000
#> 1st Qu. 88.0000 67.0000 2.111231 227.9141 1.626087 56.0000
#> Median 130.0000 101.0000 6.744263 447.7405 4.222901 112.0000
#> Mean 166.5125 129.6283 226.508527 567.4725 26.534154 139.5429
#> 3rd Qu. 214.0000 154.0000 50.674542 768.4763 11.680157 185.0000
#> Max. 1023.0000 1023.0000 10000.000000 10000.0000 10000.000000 1023.0000
#> FL4-H Time Original
#> Min. 1.000000 0.0000 1.00000
#> 1st Qu. 1.878040 71.0000 15.00000
#> Median 5.010744 191.0000 20.00000
#> Mean 21.662928 260.2665 21.04099
#> 3rd Qu. 26.740836 375.0000 27.00000
#> Max. 10000.000000 999.0000 35.00000
table(exprs(gg[,9]))
#>
#> 1 2 3 4 5 6 7 8 9 10 11 12 13
#> 3420 3405 3435 8550 10410 3750 13979 2205 13350 11610 13830 13934 19470
#> 14 15 16 17 18 19 20 21 22 23 24 25 26
#> 9405 16055 10305 30195 59849 39068 11939 13214 12666 8265 4065 7080 66105
#> 27 28 29 30 31 32 33 34 35
#> 49421 12768 17289 4530 6765 9540 25515 24656 25601
keyword(gg)
#> $description
#> [1] "Synthetic Frame"
#>
#> $sampleNames
#> [1] "s5a01" "s5a02" "s5a03" "s5a04" "s5a05" "s5a06" "s5a07" "s6a01"
#> [9] "s6a02" "s6a03" "s6a04" "s6a05" "s6a06" "s6a07" "s7a01" "s7a02"
#> [17] "s7a03" "s7a04" "s7a05" "s7a06" "s7a07" "s9a01" "s9a02" "s9a03"
#> [25] "s9a04" "s9a05" "s9a06" "s9a07" "s10a01" "s10a02" "s10a03" "s10a04"
#> [33] "s10a05" "s10a06" "s10a07"
write.FCS(gg, "gg.fcs")
#> [1] "gg.fcs"
(ggg = read.FCS("gg.fcs"))
#> flowFrame object 'gg.fcs'
#> with 585644 cells and 9 observables:
#> name desc range minRange maxRange
#> $P1 FSC-H FSC-Height 1023 0 1022
#> $P2 SSC-H SSC-Height 1023 0 1022
#> $P3 FL1-H CD15 FITC 10000 0 9999
#> $P4 FL2-H CD45 PE 10000 0 9999
#> $P5 FL3-H CD14 PerCP 10000 0 9999
#> $P6 FL2-A <NA> 1023 0 1022
#> $P7 FL4-H CD33 APC 10000 0 9999
#> $P8 Time Time (51.20 sec.) 999 0 998
#> $P9 Original Original Frame 35 0 34
#> 80 keywords are stored in the 'description' slot
summary(ggg)
#> FSC-H SSC-H FL1-H FL2-H FL3-H FL2-A
#> Min. 52.0000 0.0000 1.000000 1.0000 1.000000 0.0000
#> 1st Qu. 88.0000 67.0000 2.111231 227.9141 1.626087 56.0000
#> Median 130.0000 101.0000 6.744263 447.7405 4.222901 112.0000
#> Mean 166.5125 129.6283 226.508527 567.4725 26.534154 139.5429
#> 3rd Qu. 214.0000 154.0000 50.674541 768.4763 11.680157 185.0000
#> Max. 1023.0000 1023.0000 10000.000000 10000.0000 10000.000000 1023.0000
#> FL4-H Time Original
#> Min. 1.000000 0.0000 1.00000
#> 1st Qu. 1.878040 71.0000 15.00000
#> Median 5.010745 191.0000 20.00000
#> Mean 21.662928 260.2665 21.04099
#> 3rd Qu. 26.740835 375.0000 27.00000
#> Max. 10000.000000 999.0000 35.00000
table(exprs(ggg[,9]))
#>
#> 1 2 3 4 5 6 7 8 9 10 11 12 13
#> 3420 3405 3435 8550 10410 3750 13979 2205 13350 11610 13830 13934 19470
#> 14 15 16 17 18 19 20 21 22 23 24 25 26
#> 9405 16055 10305 30195 59849 39068 11939 13214 12666 8265 4065 7080 66105
#> 27 28 29 30 31 32 33 34 35
#> 49421 12768 17289 4530 6765 9540 25515 24656 25601
#keyword(ggg)
(ggg = read.FCS("gg.fcs", truncate_max_range = FALSE, transformation = FALSE))
#> flowFrame object 'gg.fcs'
#> with 585644 cells and 9 observables:
#> name desc range minRange maxRange
#> $P1 FSC-H FSC-Height 1023 0 1022
#> $P2 SSC-H SSC-Height 1023 0 1022
#> $P3 FL1-H CD15 FITC 10000 0 9999
#> $P4 FL2-H CD45 PE 10000 0 9999
#> $P5 FL3-H CD14 PerCP 10000 0 9999
#> $P6 FL2-A <NA> 1023 0 1022
#> $P7 FL4-H CD33 APC 10000 0 9999
#> $P8 Time Time (51.20 sec.) 999 0 998
#> $P9 Original Original Frame 35 0 34
#> 61 keywords are stored in the 'description' slot
summary(ggg)
#> FSC-H SSC-H FL1-H FL2-H FL3-H FL2-A
#> Min. 52.0000 0.0000 1.000000 1.0000 1.000000 0.0000
#> 1st Qu. 88.0000 67.0000 2.111231 227.9141 1.626087 56.0000
#> Median 130.0000 101.0000 6.744263 447.7405 4.222901 112.0000
#> Mean 166.5125 129.6283 226.508527 567.4725 26.534154 139.5429
#> 3rd Qu. 214.0000 154.0000 50.674541 768.4763 11.680157 185.0000
#> Max. 1023.0000 1023.0000 10000.000000 10000.0000 10000.000000 1023.0000
#> FL4-H Time Original
#> Min. 1.000000 0.0000 1.00000
#> 1st Qu. 1.878040 71.0000 15.00000
#> Median 5.010745 191.0000 20.00000
#> Mean 21.662928 260.2665 21.04099
#> 3rd Qu. 26.740835 375.0000 27.00000
#> Max. 10000.000000 999.0000 35.00000
table(exprs(ggg[,9]))
#>
#> 1 2 3 4 5 6 7 8 9 10 11 12 13
#> 3420 3405 3435 8550 10410 3750 13979 2205 13350 11610 13830 13934 19470
#> 14 15 16 17 18 19 20 21 22 23 24 25 26
#> 9405 16055 10305 30195 59849 39068 11939 13214 12666 8265 4065 7080 66105
#> 27 28 29 30 31 32 33 34 35
#> 49421 12768 17289 4530 6765 9540 25515 24656 25601
#keyword(ggg)
sessionInfo()
#> R version 4.0.0 (2020-04-24)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#>
#> Matrix products: default
#>
#> locale:
#> [1] LC_COLLATE=French_France.1252 LC_CTYPE=French_France.1252
#> [3] LC_MONETARY=French_France.1252 LC_NUMERIC=C
#> [5] LC_TIME=French_France.1252
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] flowCore_2.0.0
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.4.6 matrixStats_0.56.0 RProtoBufLib_2.0.0
#> [4] digest_0.6.25 stats4_4.0.0 magrittr_1.5
#> [7] evaluate_0.14 RcppParallel_5.0.0 highr_0.8
#> [10] rlang_0.4.5 stringi_1.4.6 rmarkdown_2.1
#> [13] tools_4.0.0 stringr_1.4.0 Biobase_2.48.0
#> [16] parallel_4.0.0 xfun_0.13 yaml_2.2.1
#> [19] compiler_4.0.0 BiocGenerics_0.34.0 cytolib_2.0.0
#> [22] htmltools_0.4.0 knitr_1.28
Created on 2020-05-07 by the reprex package (v0.3.0)
I think it's OK. I will reopen if needed. Thanks.
Describe the bug The ranges written on disk are integers, truncated to the lowest values, but data are not truncatedwhen read back. I don't understand at all. If I am wrong (because tired), let me know. On the contrary, can we solve this for good, and discuss the corresponding tests? Best, Samuel
To Reproduce Steps to reproduce the behavior: please use the reprex package to build a reproducible example.
Created on 2020-05-07 by the reprex package (v0.3.0)