RGLab / flowWorkspace

flowWorkspace
GNU Affero General Public License v3.0
44 stars 21 forks source link

Issue reproducing FlowJo gating (exactly) in flowWorkspace #382

Closed phauchamps closed 1 year ago

phauchamps commented 1 year ago

Hi Mike and Team,

I am currently struggling with reproducing the manual gating as performed in FlowJo, into an R script, which I need to automate the calculation of concordance metrics between manual gating in FlowJo, and automated gating in R. For this, I am using flowWorkspace, which can read a .wsp workspace file -as generated/saved by FlowJo. It seems to technically work, but I still obtain discrepancies between the application of the gating in R (with flowWorkspace gating objects), and the same process in FlowJo. The event labels (in or out specific gated populations) I obtain are not exactly the same, with sometimes very different percentage of events for the gating of rarer populations, and I suspect the main differences are due to math formula/code variations in the scale transformations (FlowJo vs. flowCore in C++).

In particular, the transfo type and parameters I am currently investigating are the following (from FlowJo .wsp file):

<transforms:biex transforms:length="256"  transforms:maxRange="262144"  transforms:neg="0"  transforms:width="-100"  transforms:pos="4.418539922" >
           <data-type:parameter data-type:name="Comp-Alexa Fluor 700-A" />
</transforms:biex>

Have you noticed/heard of such issues before and would you have advice to improve the matching between both implementations?

Thanks a lot,

Philippe

mikejiang commented 1 year ago

when you parse a wsp into a gatingset, it is expected to see minor difference between xml stats and opencyto stats, e.g.

> gh_pop_compare_stats(gh)
    openCyto.freq   xml.freq openCyto.count xml.count            node
 1:    1.00000000 1.00000000         119531    119531            root
 2:    0.76733232 0.76733232          91720     91720      not debris
 3:    0.94877889 0.94889882          87022     87033        singlets
 4:    0.62608306 0.62892236          54483     54737            CD3+
 5:    0.62463521 0.62266840          34032     34083             CD4
 6:    0.03299835 0.03297832           1123      1124     CD4/38- DR+

however, if you observe the significant difference, there might be parsing issue with this particular workspace, which we will need an example to troubleshoot

phauchamps commented 1 year ago

Hi Mike,

With the new insight coming from your answer, I did a bit of further investigation on my side.

Actually I was using flowWorkspace to apply a gating hierarchy defined in FlowJo, to a new fcs file, which in this case was supposed to correspond - in terms of data points - to the original data file that was imported in flowJo.

I found out that the main discrepancies I had found were due to a mismatch of compensation matrix between flowJo and my R code (flowJo was not using the acquired compensation matrix for this particular sample, but a copy which had been manually updated). So nothing to do with flowWorkspace code :-)

On top of that, I found a couple of other sources of more minor discrepancies, which I thought I could provide here for information:

In very short, you can now close this issue, as soon as you have read this comment :-)