eyurtsev / fcsparser

A python parser for reading fcs files supporting FCS 2.0, 3.0, 3.1
MIT License
74 stars 45 forks source link

Parameter $PnR not being used? #19

Closed ssbotelh closed 4 years ago

ssbotelh commented 5 years ago

Hi, I'm new to FCS, so this might be some misunderstanding from my end. But the 3.1 standard indicates that, for list mode and integer data type, the $PnR parameter is the range (i.e., max value) a parameter can have. From page 24:

$PnR/n1/ $P2R/1024/ $P2R/262144/ [REQUIRED] For $DATATYPE/I/ this keyword specifies the maximum range, n1, of parameter n. For $MODE/L/ (list mode data), this corresponds to the ADC range, e.g., 1024. In that case, the data values can range from 0 to 1023. ... For $DATATYPE/I/, the value of $PnR also indirectly specifies the bit mask that should be used when reading values.

When running fcsparser on a 3.1 FCS file, I get the following:

                    $PnE     $PnN  $PnB       $PnS  $PnR $PnV
Channel Number                                               
1                 [0, 0]       FS    16         FS  1024  393
2                 [0, 0]       SS    16         SS  1024  314
3               [4, 0.1]  FL1 LOG    16  CD41 FITC  1024  683
4               [4, 0.1]  FL2 LOG    16    CD7 RD1  1024  617
5               [4, 0.1]  FL3 LOG    16   CD45 ECD  1024  636
6               [4, 0.1]  FL4 LOG    16   CD33 PC5  1024  826

            FS       SS  CD41 FITC  CD7 RD1  CD45 ECD  CD33 PC5
0      34015.0  36163.0    38184.0  40097.0   42218.0   44256.0
1       1647.0   3346.0     5390.0   7342.0    9484.0   11757.0
2      34416.0  36018.0    38337.0  40155.0   42129.0   44487.0
...

So, the question is, why are all these values greater than 1024? Aren't the masks being applied to the integers? Thanks!

eyurtsev commented 5 years ago

The philosophy was the philosophy was to try and read the raw data and the raw metadata and do no further interpretation.

I haven't read the spec in forever, but if I understand you correctly, you're saying that at read time a bit-mask is needed to actually read the raw data correctly.... otherwise the raw data is incorrect. If that's the case, you could make a PR to correct the behavior.

The original parser was implemented to handle all the FCS files that I had to work with (a few formats coming from 3-4 different flow cytometers)... so I probably didn't bump into this case before

Anyway happy to accept all PRs

ssbotelh commented 5 years ago

Thanks! Yes, it seems pretty clear from the standard that one must apply a mask in the case of integer data type. Otherwise, if the word size is greater than the range (as in the example above, where word size = 16 bits and range = 10 bits) you will get incorrect values.