RGLab / flowCore

Core flow cytometry infrastructure
43 stars 25 forks source link

Cannot read a specific FCS 3.1 file for unknown reason #269

Open CroixJeremy2 opened 3 months ago

CroixJeremy2 commented 3 months ago

Hello, I am quite new in reading fcs files using R, therefore I am sorry in advance if the issue have already been raised in the past (I couldn't find similar issues on RGLab/flowCore/issues though).

Here are three FCS 3.1 files that have been generated on the same machine (https://research.pasteur.fr/en/equipment/big-foot/) during the same day for a sorting experiment: https://dl.pasteur.fr/fop/P67e8o6r/Test_folder.zip

Screenshot 2024-05-29 at 11 31 42

However, I cannot read Sample_A, while Sample_B and Sample_C can be perfectly read using read.FSC(). Interestingly, all three files are correctly detected as FCS files using isFCSfile(). The problem happens in MacOS and also on Windows with fresh installation of R and flowCore. Here are the script and outputs from R:

library(flowCore)

# Windows
# setwd("C:/Users/jchantre/Desktop/Test_folder")

# MacOS
# setwd("/Users/jchantre/Desktop/Test_folder")

isFCSfile("Sample_A.fcs")
isFCSfile("Sample_B.fcs")
isFCSfile("Sample_C.fcs")

a = read.FCS("Sample_A.fcs")
b = read.FCS("Sample_B.fcs")
c = read.FCS("Sample_C.fcs")

head(a)
head(b)
head(c)

On MacOS:

Screenshot 2024-05-29 at 11 43 18

On Windows: Capture

Is there a way to solve this error message?

> a = read.FCS("Sample_A.fcs")
Error in rawToChar(txt) : 
  embedded nul in string: '*$BEGINANALYSIS*000000000000*$BEGINDATA*000000006769*$BEGINSTEXT*000000000000*$BYTEORD*1,2,3,4*$DATATYPE*F*$ENDANALYSIS*000000000000*$ENDDATA*005622116501*$ENDSTEXT*000000000000*$MODE*L*$NEXTDATA*000000000000*$PAR*29*$P1B*32*$P1E*0,0*$P1N*Time*$P1R*2147483647*$P1L*0*$P1O*0*$P1S*Time*$P1V*0*$P2B*32*$P2E*0,0*$P2N*FSC04-A*$P2R*100000*$P2F*488_FSC*$P2L*488*$P2O*125*$P2S*488 FSC-A*$P2V*306*$P3B*32*$P3E*0,0*$P3N*SSC56-A*$P3R*100000*$P3F*488_SSC*$P3L*488*$P3O*125*$P3S*488 SSC-A*$P3V*362*$P4B*32*$P4E*0,0*$P4N*FSC04-H*$P4R*100000*$P4F*488_FSC*$P4L*488*$P4O*125*$P4S*488 FSC-H*$P4V*306*$P5B*32*$P5E*0,0*$P5N*FSC04-W*$P5R*100000*$P5F*488_FSC*$P5L*488*$P5O*125*$P5S*488 FSC-W*$P5V*306*$P6B*32*$P6E*0,0*$P6N*SSC56-H*$P6R*100000*$P6F*488_SSC*$P6L*488*$P6O*125*$P6S*488 SSC-H*$P6V*362*$P7B*32*$P7E*0,0*$P7N*SSC56-W*$P7R*100000*$P7F*488_SSC*$P7L*488*$P7O*125*$P7S*488 SSC-W*$P7V*362*$P8B*32*$P8E*0,0*$P8N*FSC58-H*$P8R*100000*$P8F*488_FSC_Polar*$P8L*488*$P8O*125*$P8S*488 FSC P

Thanks in advance for your response, Best regards,

sessionInfo() in MacOS:

R version 4.3.3 (2024-02-29) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.7.5 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: Europe/Paris tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] flowCore_2.14.2 loaded via a namespace (and not attached): [1] compiler_4.3.3 RProtoBufLib_2.14.1 cytolib_2.14.1 Biobase_2.62.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 matrixStats_1.3.0 stats4_4.3.3

sessionInfo() in Windows:

R version 4.4.0 (2024-04-24 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 10 x64 (build 17763) Matrix products: default locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 time zone: Europe/Paris tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] flowCore_2.16.0 loaded via a namespace (and not attached): [1] compiler_4.4.0 RProtoBufLib_2.16.0 cytolib_2.16.0 [4] Biobase_2.64.0 S4Vectors_0.42.0 BiocGenerics_0.50.0 [7] matrixStats_1.3.0 stats4_4.4.0

SamGG commented 3 months ago

Hi, In fact, the header of Sample_A is wrong (explanations below). FlowCore could take this error into account, but this will not correct a bug issued by the instrument. @mikejiang should/will you? Meanwhile, if you have only a few files to correct, I suggest to use HxD 64bits with caution.

image

My best wishes to Bernd. Best, Samuel

In B, the end of the text is at 4902, and the start of data is 4903. Both are on this screenshot. image These values are correct and agree with the data in the screenshot below. image

In A, the end is incorrect (no idea why) image The real end is at 4888 as in the following screenshot. image Use HxD and change 6768 into 4888 and 6769 into 4889, and magic might appear... but as the end of data is also wrongly reported, you should have a little more work if flowCore doesn't manage it.

gfinak commented 3 months ago

Thanks, @SamGG. My 2c here, @CroixJeremy2, your instrument generated an invalid file ( in that it set the begin and end of data / text wrong in the header). You can certainly manually fix this...but why would you trust the data from this file.. who knows what else the software / instrument corrupted. If this data matters in any way.. I wouldn't trust it.