gaussNorm: I don't understand the resulting transformation

SamGG commented 1 month ago

I see problems with this code:

I don't understand changing the intensity of the signal; I agree to use the gaussian function to weight the shift, which leads to a slightly different code, but not to change the intensity because it results in shrinking effects at the ends
setting a specific s aka width for each landmark leads to unwanted results IMHO

https://github.com/RGLab/flowStats/blob/fabb13191a2567e41a89691cab31507d4cbddc7a/R/gaussNorm.R#L497-L500

Code to reproduce my trials

# aim to untouch peak at 0 and shift at 1.24 to 1.74, ie add 0.5
landm = c(0, 0, 1.24, 1.74)
# white noise to visualize the transform
xx = runif(N, -1, 3)
# s from the data = sd(data) ~ 0.6

# problem: ends are not equal to the shift at the nearest landmark
# the negative peak ends at -0.5 in real data
# the positive peak ends at +2.5 in real data
yy = flowStats:::register.function(
    xx, c(.5, .5), landm[c(1,3)], landm[c(1,3)+1]-landm[c(1,3)])
plot(xx, yy); abline(c(0,1)); abline(c(0.5,1))
plot(xx, yy-xx, asp = 1); abline(h=c(0,0.5))

# user should be cautious when setting too small values for s
yy = flowStats:::register.function(
    xx, c(.25, .5), landm[c(1,3)], landm[c(1,3)+1]-landm[c(1,3)])
plot(xx, yy); abline(c(0,1)); abline(c(0.5,1))
plot(xx, yy-xx, asp = 1); abline(h=c(0,0.5))

Here are the 2nd and 4th graphs showing the shift applied to the data as a function of the intensity (from asinh(FluoIntensity/cofactor)).

@mikejiang @sofieVG @cciccole

gfinak commented 1 month ago

@SamGG, just a heads-up that the authors of the gaussNorm approach are from this paper https://onlinelibrary.wiley.com/doi/10.1002/cyto.a.20823 (14 years ago). I'm not sure if any of them are still involved in computational flow. gaussNorm, as far as I know, is long-since orphaned and unsupported. I'm not sure who can provide insight into what the code is doing or why it does so, or any possible bugs, known or unknown.

SamGG commented 1 month ago

Hi Greg. Thanks for your quick feedback.

Except Ryan, I don't see these names nowdays, and he seems not to be on github. May @tslumley or @fhahne answer?

My main point was to warn potential users and to keep a track if someone wants to investigate.

What I coded for 2 peaks is:

  if(length(m)==2){
    sh=(shift[1]-shift[2])
    w1 = flowStats:::gau(data, s[1], m[1])
    w2 = flowStats:::gau(data, s[2], m[2])
    ws = w1 + w2
    data=data+w1/ws*(sh/2)
    data=data-w2/ws*(sh/2)
    return(data+shift[1]-(sh/2))
  }

This leads to correct results IMHO when s is the same for the 2 peaks.

s = c(0.5, 0.5)

s = c(0.25, 0.25)

s = c(0.25, 0.5)

The latter is still slightly weird (although comprehensive) as s is not similar. This should not arise as the API does not provide to adjust it. BTW, the estimation of s is the sd of all the intensity of the channel (code below). This relies on all the data, not of each peak, which is a coarse estimation in my view.

https://github.com/RGLab/flowStats/blob/fabb13191a2567e41a89691cab31507d4cbddc7a/R/gaussNorm.R#L472-L475

cciccole commented 1 month ago

@SamGG I would need to see more practical examples that show the data -- not just the diff -- in order to understand more clearly, though I don't doubt you've found some sensible improvement.

However, separately, I also think this algorithm has other noteworthy problems so I'm not sure how much thought I can give this particular one 😁 If I recall correctly:

Cannot normalize matrices with only 1 column (the classic R bug of error after 2D object is converted to 1D automatically)
Extremely inefficient (cannot scale to larger datasets)
Can't deal with individual files that are missing a peak compared to their counterparts. I.e., only have negatives in a particular channel.
Related to the above, misidentifies landmarks and can do crazy shifts as a result. E.g., negatives get shifted to positives or vice versa.
Maybe some more stuff.

The core of the warping, when it gets it right, seems pretty good. Unfortunately there are many situations where it fails. If the core warping strategy were wrapped with a more robust surrounding layer, that could be valuable.

RGLab / flowStats

gaussNorm: I don't understand the resulting transformation #54