RGLab / flowStats

flowStats: algorithms for flow cytometry data analysis using BioConductor tools
15 stars 10 forks source link

Normalization masking population shifts? #25

Closed mfahlberg824 closed 5 years ago

mfahlberg824 commented 5 years ago

FlowStats & FlowWorkspace are incredible packages and I sincerely am grateful to and applaud the developers. In learning to use these recently, though, I noticed a potential issue in terms of biology and a possible way to improve it.

In the Getting Started with FlowStats document, it's noted that the three "rules" for normalization:

􏰀1) High density areas represent particular sub-types of cells. 􏰀2) Markers are binary. Cells are either positive or negative for a particular marker. 􏰀 3) Peaks should align if the above statements are true.

It then proceeds to use an example panel that includes CD3, CD4, CD69, CD8, and HLA-DR. Samples are normalized in all the channels by the peak-method.

I think it's an excellent tool for many (most?) markers that DO follow the binary rule #2, however, it can be biologically misleading when applied inappropriately, as I believe may have happened when CD69 in this data was normalized:

Screen Shot 2019-03-31 at 12 47 12 PM

The issue is, CD69 often is not expressed as a binary marker when activated. See here:

Screen Shot 2019-03-31 at 12 45 34 PM

When cells are activated, the entire population will often shift to the right. When CD69 was normalized in the document and the peaks aligned, it could have resulted in erroneous negative data. The question therefore is - when do you know if it's a technical artefact causing a shift, and when is it a real shift? I'm not sure that this has been solved yet in the field.

I have one possible solution, which would be to use per-batch reference controls in the normalization process. The reference controls would be samples collected from one person at one time and processed the same way, or alternatively one could use lyophilized Reference Controls like Veri-Cells from BioLegend. We currently utilize reference controls in all of our samples for an NIH pre-clinical vaccine trial for SIV-infected monkeys to ensure that all of the processing, antibodies, staining, and acquisition are working properly with each run.

Anyway onto the math, let's say, Batch 1 is run on Monday, and includes Reference Control + 3 samples. Batch 2 is run on Friday, and includes Reference Control + 3 samples. Could we normalize (align the peaks) of the Reference Control from Friday to the Reference Control from Monday, and then adjust the MFI of the channels from all of the samples from Batch 2 to correspond with the MFI adjustment that was made when aligning the peaks? The goal would be not to not undercut what could be relevant biological information in entire population shifts.

The current way to deal with this is to specifically normalize only the truly binary populations and not the CD69-like populations where the whole population just shifts, however, that still doesn't remove the technical variation from the CD69 channel and doesn't help assess whether the shifts are biologically true or not.

-Marissa

gfinak commented 5 years ago

Hi, Marissa Normalization definitely comes with all sorts of pitfalls, as you point out. Your point about technical vs biological effects is correct and should be accounted for by the experimental design, something the current normalization framework doesn't currently do.

For those reasons and others we tend not to use normalization except in the simplest cases and opt rather for sample-specific automated gating, where experimental design can be used to resolve such questions, sometimes leveraging the controls as you suggest, for example using the gate from the reference control to gate the other samples in a batch, and then including a batch variable in downstream modeling of cell population counts or MFIs to adjust for any remaining batch effects.

mfahlberg824 commented 5 years ago

Gotcha, thanks for the information! We are working to try to make our data as rigorous & reproducible as possible at our Institution, so I decided to embark on the automatic gating/normalization journey. I'll keep your comments in mind as we try to figure out the best system for removing as much bias as we can from our data.