TomasLenc / acf_tools

Utilities for working with autocorrelation to capture periodicity in a signal.
GNU General Public License v3.0
2 stars 0 forks source link

subtract instead of divide out 1/f estimate #1

Closed TomasLenc closed 1 year ago

TomasLenc commented 1 year ago

Big update!

I realized I was doing something very suboptimal, and I think I found a much better way to do it.

In short, in the previous version of acf_tools, the steps to compute the 1/f-normalized ACF were as follows:

  1. Calculate magnitude spectrum mX up to nyquist.
  2. If the signal is periodic, and its fundamental frequency f0 is known, remove the peaks at all harmonics of f0 from mX by substituting their values with the mean of surrounding bins.
  3. Fit 1/f noise parameters to mX and use them to calculate the magnitude of the aperiodic component ap at each frequency.
  4. Calculate full complex spectrum X.
  5. Mirror the estimated ap magnitudes so that they cover the full spectrum from DC (0 Hz) to sampling rate. This way, we have a single ap magnitude estimate for each frequency bin in X.
  6. Divide the complex number at each frequency bin in X by the corresponding magnitude of estimated ap component at that exact frequency. This gives us X_norm.
  7. Multiply X_norm by it's complex conjugate and apply ifft to obtain full autocorrelation function.

Now, this did work. Kind of. It did remove 1/f. However, because of the division operation, this was significantly distorting the spectrum, and hence the resulting autocorrelation function, resulting in biased values of the zscores at meter-related lags.

I realized this when I was simulating the effect of noise, using different selections of meter-related lags, as well as different unitary response shapes. For some choices, the estimated zscore at meter-related lags was biased even when there was essentially no noise!

So I spent the day thinking and trying things, and I came up with something very simple, which I should have thought of much earlier.

The idea is that instead of dividing by ap at each frequency bin, we can do this for each frequency bin:

  1. Construct a complex vector with the same direction as the corresponding complex vector at that frequency in X. We will call this the normalization vector, x_norm.
  2. Set the magnitude of x_norm to the value of ap at the corresponding frequency.
  3. If the magnitude of X at the corresponding frequency is larger that x_norm, subtract x_norm from the complex vector in X.
  4. If the magnitude of X at the corresponding frequency is smaller that x_norm, set the complex vector in X to be a zero vector. This will avoid flipping that vector to the other side by subtracting a longer x_norm from it.
  5. This gives us X_norm, which can be used to compute the autocorrelation function in the frequency domain as before.

That's it. We just need to be careful about some zeros along the way so that we don't get nans.

But there is even more powerful, optional step.

  1. If you know the exact frequencies at which you expect your signal to occur, you can essentially set all other frequencies in X_norm to be zero vectors.

For example, below I simulated a periodc signal with period equal to 0.8 s, i.e. it only projects to harmonics of 1.25 Hz. Now, after adding some noise, I computed the ACF in three ways:

From the figure its' clear that the most powerful way to remove the 1/f from the autocorrelation estimate is the third approach.

TomasLenc commented 1 year ago

Here is a new figure illustrating the updated 1/f-normalization method. explain_ap_fit_TL2

cedlenoir commented 1 year ago

How can we explain the difference between the orange and black autocorrelation signals? the method is similar to correct for 1/f. Is this due to the fact that the correction is not perfect, then if one compute the acf the results still exhibit the remaining 1/f? then I was expecting that acf at 0.8 s lags and multiples would be the same. but it seems that there are some differences.

not directly relating to that, I tried the new version on the data of XP1 (audio vs vibro) using 2 "max_freq" (9 Hz or fs/2) for the fitting (other parameters set to default), as you can see below the acf change.

same scale max_freq_9Hz max_freq_nyquist

TomasLenc commented 1 year ago

How can we explain the difference between the orange and black autocorrelation signals?

You're right, in both orange and black signals, we've subtracted magnitude driven by the estimated 1/f. However, as you can see in the middle portion of the second figure I sent above, even after subtracting the 1/f estimate, there is still energy at frequency bins that are not multiples of the 1 / signal repetition rate. And the acf estimated from the spectrum is going to be affected by whatever is present at these frequency bins. Now, we know apriori that in case we had a perfectly clean signal without noise, there would be no energy at these bins. Because we know apriori which frequencies our signal is going to project to (F0 corresponding to 1 / signal repetition rate and harmonics). So we can use that knowledge. We have already adjusted the "signal" bins by removing the estimated contribution of noise. So now, we can simply zero-out all the "non-signal" frequency bins, so that they don't distort the acf estimate we're gonna compute subsequently.

cedlenoir commented 1 year ago

here are the results for XP2.

maxfreq_9Hz maxfreq_nyquist

cedlenoir commented 1 year ago

Indeed, (and that was probably the reason you set max_freq by default to nyquist, sorry if I am slow) limiting the range to 9 Hz strongly biases the correction. ap fit test1 ap fit test2

TomasLenc commented 1 year ago

Okay, I think I can conclude that this new method is working. Let's merge to move onto other stuff.