RGLab / flowStats

flowStats: algorithms for flow cytometry data analysis using BioConductor tools
15 stars 10 forks source link

gaussNorm is creating outliers (?) #33

Closed jgarces02 closed 4 years ago

jgarces02 commented 4 years ago

Dear RGLab team,

I'm running gaussNorm and it seems some outliers are being created... I've read #20 's answers but my data is correctly transformed.

This is my first plot... image

... and this another one is after normalization: image

I guess that's because there're some outlier values.... image

...but that aren't appearing in initial data:

> summary(tranformed_data)
     FSC_A           FSC_H           SSC_A            SSC_H           CD62L             CXCR3               CD8          
 Min.   :3.662   Min.   :3.691   Min.   :0.7128   Min.   :1.301   Min.   :-0.1347   Min.   :-0.19738   Min.   :-0.65457  
 1st Qu.:5.288   1st Qu.:5.122   1st Qu.:3.9964   1st Qu.:3.711   1st Qu.: 0.2651   1st Qu.: 0.08052   1st Qu.: 0.08338  
 Median :5.835   Median :5.616   Median :5.3577   Median :4.978   Median : 0.6581   Median : 0.18685   Median : 0.24906  
 Mean   :5.617   Mean   :5.418   Mean   :4.9664   Mean   :4.638   Mean   : 1.0277   Mean   : 0.45534   Mean   : 0.52559  
 3rd Qu.:6.094   3rd Qu.:5.871   3rd Qu.:5.8503   3rd Qu.:5.492   3rd Qu.: 1.7663   3rd Qu.: 0.41000   3rd Qu.: 0.46543  
 Max.   :6.955   Max.   :6.738   Max.   :6.9552   Max.   :6.914   Max.   : 5.2601   Max.   : 6.61827   Max.   : 6.70722  
      CCR4              CCR6              CD4                CD45              CD27        
 Min.   :-0.9744   Min.   :-1.6870   Min.   :-0.83469   Min.   :-0.1331   Min.   :-0.3629  
 1st Qu.: 0.4553   1st Qu.: 0.4002   1st Qu.:-0.05084   1st Qu.: 0.3288   1st Qu.: 0.2095  
 Median : 1.1746   Median : 0.6695   Median : 0.13477   Median : 0.4793   Median : 0.3910  
 Mean   : 1.2556   Mean   : 0.8325   Mean   : 0.56928   Mean   : 1.0448   Mean   : 1.0301  
 3rd Qu.: 1.8076   3rd Qu.: 1.0263   3rd Qu.: 0.46503   3rd Qu.: 0.9119   3rd Qu.: 0.9414  
 Max.   : 6.9536   Max.   : 4.3308   Max.   : 6.79908   Max.   : 5.7351   Max.   : 5.5107  

> summary(normalized_data)
     FSC_A            FSC_H            SSC_A            SSC_H            CD62L              CXCR3          
 Min.   :  9732   Min.   : 10020   Min.   : -4971   Min.   :   850   Min.   :  -67.54   Min.   :   -99.33  
 1st Qu.: 49501   1st Qu.: 41926   1st Qu.: 12817   1st Qu.: 10218   1st Qu.:  134.11   1st Qu.:    40.30  
 Median : 85520   Median : 68696   Median : 55719   Median : 36294   Median :  353.33   Median :    93.97  
 Mean   : 81207   Mean   : 65300   Mean   : 53584   Mean   : 38840   Mean   :  995.21   Mean   :   412.80  
 3rd Qu.:110849   3rd Qu.: 88644   3rd Qu.: 82502   3rd Qu.: 60697   3rd Qu.: 1419.62   3rd Qu.:   210.79  
 Max.   :262143   Max.   :210880   Max.   :262143   Max.   :251689   Max.   :48123.08   Max.   :187161.67  
      CD8                 CCR4               CCR6              CD4                 CD45               CD27        
 Min.   :  -351.16   Min.   :  -568.0   Min.   :-1304.6   Min.   :  -467.52   Min.   :  -66.75   Min.   : -185.5  
 1st Qu.:    41.74   1st Qu.:   235.6   1st Qu.:  205.5   1st Qu.:   -25.43   1st Qu.:  167.38   1st Qu.:  105.5  
 Median :   125.82   Median :   732.0   Median :  360.3   Median :    67.59   Median :  248.94   Median :  200.5  
 Mean   :  1024.59   Mean   :  1706.3   Mean   :  664.4   Mean   :  1047.55   Mean   : 2934.20   Mean   : 3007.6  
 3rd Qu.:   241.21   3rd Qu.:  1483.0   3rd Qu.:  608.1   3rd Qu.:   240.99   3rd Qu.:  521.82   3rd Qu.:  543.4  
 Max.   :204572.89   Max.   :261730.7   Max.   :18997.3   Max.   :224255.84   Max.   :77383.98   Max.   :61829.5  

Any idea, please, if I'm doing something wrong? Thanks in advance.

gfinak commented 4 years ago

We are not supporting gaussNorm. It's a legacy method developed over 10 years ago. You may want to look at using some of the more modern approaches published more recently..

jgarces02 commented 4 years ago

Oh, that's a pity, I found it very useful... In addition to cytoNorm, what another normalization methods do you know, please? Thanks again.

SamGG commented 4 years ago

I agree, I think it is still useful for flow cytometry (and maybe mass also), especially when there are positive peak to ease anchoring at high range, because gaussNorm relies on the peak position rather than the identity of the distribution for quantile methods. There is a similar question https://github.com/RGLab/flowStats/issues/32 with my comment and Greg's answer.

jgarces02 commented 4 years ago

Perfect, thanks a lot for your help!

markemus commented 1 year ago

Are there any modern (or maintained) methods that don't require a control sample across batches? CytoNorm, batchadjust and FAUST all do.