issue in spillover_ng : 'pregate = TRUE' selects wrong side of the gate

phauchamps commented 2 years ago

Dear flowStats team,

During the last few days, I have been testing the compensation in flowStats (spillover_ng()) and have come accross a couple of issues. Here is one of them :

in function spillover_ng, activating pregating ('pregate = TRUE') often ends up in wrongly selecting the rangeGate, by taking the wrong side of the separation point. A a consequence, it can happen that very few events are selected to calculate the statistics, and the figures are wrong.

Attached is an easy example for reproducing the behavior. It uses a public dataset that you would need to download from flowRepository (https://flowrepository.org/id/FR-FCM-ZZ36). The example R file performs the compensation for a small subset of 2 channels. Output graphs from the pregating in flowStats are provided (wrong in the case of channel PE-Cy5-5 as the right hand side is selected), as well as the calculation of the compensation in flowJo, showing that the row of the PE-Cy5-5 channel is wrong with 'pregate = TRUE', but is similar as FlowJo with 'pregate = FALSE'.

Thanks,

Philippe flowStats_spillover_issue_pregating.zip

malisas commented 2 years ago

Thanks again for the reproducible example, Philippe.

I did notice one odd thing about this particular example, which is that the PE-Cy5-5 channel has higher fluorescence in the PE-Cy5 stained control (see left panel in image) than in the PE-Cy5-5 stained control (middle panel). PE-Cy5-5-A Compensation Issue The PE-Cy5 channel looks more like what I'd expect (The PE-Cy5 stained control has the highest PE-Cy5 signal): Screenshot from 2021-12-16 10-14-53 I wonder if something abnormal occurred for PE-Cy5-5 during the experimental procedure (wrong antibody, fluorophore breakdown, wrong well, etc).

I don't know if the abnormal staining observed above is related to the "wrong side of the gate" pregating issue. It could be helpful to see this occur on a more "normal" dataset in order to isolate the problem.

Still:

It does seem like the gate should be placed on the left side of the unimodal population, and I wonder if we can tweak the rangeGate parameters to make it happen.
I think it is a good idea to investigate why FlowJo and flowStats have different answers.

phauchamps commented 2 years ago

Hi Malisa,

You are indeed right, in the sense that the PE-Cy5.5 fluorochrome generates more fluorescence intensity in the PE-Cy5 channel, than the PE-Cy5 fluorochrom itself. I don't think it is a mistake, but can happen depending on the relative characteristics of the emission spectra of the 2 fluorochroms, as well as the relative voltage of the 2 detectors. I am not a flow cytometry scientist myself (I am more interested in the data analysis techniques for flow cytometry), but my guess is that it is not per se an issue in the panel design as soon as the two corresponding markers are not expected to be expressed simultaneously in the same cells. Btw the dataset I am using comes from a OMIP panel design : https://onlinelibrary.wiley.com/doi/10.1002/cyto.a.22278. In fact I witnessed this 'wrong side of the gate' several times, and I came up with this reproduceable example, but it is not particularly specific to this case.

Regarding your second bullet point above, I checked that the difference between FlowJo and flowStats is indeed due to the wrong side of the gate. Debugging step by step into the code of spillover_ng, I noticed that the diverging matrix coefficient was calculated in flowStats based on only 5(!) cell events due to the wrong side being used to generate the spillover coefficient.

Not sure that tweaking the rangeGate parameters would be such a good idea. It seems to me that 'rangeGate' is designed to separate two populations (hence two distinct peaks), while here we need to identify one single peak, and this where the issue comes from.

By the way, I found another strange behavior in such a case where the spillover signal in one channel is greater than the true signal in that channel. It seems indeed that spillover() should normalize the coefficients of the spillover matrix such that the diagonal coefficients are equal to ones (at least this is what is said in the documentation). However I suspect that what the code does is to normalize in such a way that the highest coefficient of each row is equal to one, which is not correct is such cases of high spillover. Could you check that also ? Should I raise another issue for this one ?

Thanks and regards, Philippe.

DillonHammill commented 2 years ago

Matching a channel to each single stain control based on fluorescent intensity is a really bad idea. More often than not you will actually get more signal in an adjacent detector than in the expected detector. I would instead use the matchfile argument in spillover_ng() to ensure correct matching between channels and controls. Of course you could also use cyto_spillover_compute() in CytoExploreR which addresses all the issues mentioned above and allows manual gating of the positive and negative signal in each control.

phauchamps commented 2 years ago

Matching a channel to each single stain control based on fluorescent intensity is a really bad idea. More often than not you will actually get more signal in an adjacent detector than in the expected detector.

@DillonHammill I don't get your point, as this is not what is done in this example. Each channel intensity is measured for each single stain independently and compared to negative 'no stain' control. The file matching is done explicitly using a .csv mapping file, not based on intensities. But if the goal of your post was to advertise your CytoExploreR package, ok I will have a look :-)

DillonHammill commented 2 years ago

CytoExploreR is closely intertwined with all the cytoverse packages and so it is likely that if you encounter issues or are searching for additional features, you will probably find them in CytoExploreR.

There are a number of issues with spillover() and spillover_ng() that have been resolved in CytoExploreR::cyto_spillover_compute() these include:

Inability to use combinations of Beads and Cells
Inability to use multiple controls for each channel (CytoExploreR will pick the control with brightest signal in each channel automatically).
Difficulties in getting accurate gates automatically. CytoExploreR allows manual gating instead for the Bagwell method.
The normalisation step after sweeping out the unstained signal is incorrect. Instead of dividing by the max signal, it should instead divide by the signal in the expected channel (i.e. the channel matched to the file through matchfile in spillover_ng(). The problem is that spillover_ng() still calls spillover() for this computation so we run into the same normalisation issue. https://github.com/RGLab/flowStats/blob/1f02de9a21e2908fb2b07937070839926ff486d2/R/spillover.R#L195-L196

If you look at the source code for cyto_spillover_compute() you will notice that I actually use the value for the matched channel for the normalisation step. I do this by storing the channel match information in pData() so it is attached to each file. https://github.com/DillonHammill/CytoExploreR/blob/5e9843f6aa18800203001794f680f23ece04e0de/R/cyto_spillover_compute.R#L226-L230

In the long run, I suspect that most users will use CytoExploreR::cyto_spillover_compute() for the above reasons as well as the fact that the new version supports autospill (no gating required and very accurate coefficients) and CytoExploreR::cyto_spillover_edit() provides an interactive interface for manual finetuning.

I have had a look at the code it will take a bit of work to fix this, I am not sure if it is worth the time provided that a more robust alternative already exists. What do you think @mikejiang? If it is important to you to get this working, I can help sort the normalisation step when I get some time.

phauchamps commented 2 years ago

Hi flowStats team,

Any update on this issue ?

Thx,

Philippe.

RGLab / flowStats

issue in spillover_ng : 'pregate = TRUE' selects wrong side of the gate #40