spectral kurtosis values of integrated chi-squared noise

telegraphic commented 2 years ago

I've been using spectral kurtosis for flagging in hyperseti, and have noticed that the noise statistics generated by setigen don't quite match what is expected from the generalized spectral kurtosis estimator, see:

The SK variance should come out as 2 / sqrt(N_accumulations) for a 'Pearson type IV' distribution, which I think most setigen generated data should be. I think this is just a matter of scaling the PDF by averaging first, but the math is admittedly a bit hectic in those papers.

bbrzycki commented 2 years ago

I looked into this a bit, and I think I've made some sense of this (and how it's used in hyperseti):

First, and foremost, I believe there's a typo in Nita et al. 2016; the variance of SK should be approx. 2/M, not 2/sqrt(M) [in addition, they use sigma for the variance -- that may be where some confusion arose].

Second, for hyperseti's implementation, at least when it comes to setigen data, since the synthetic noise assumes dual-polarization, the value for d should be 2, instead of 1, corresponding to a chi-squared distribution with DOF=4 (2 pol * 2 complex components). This is Eq. 1 in Nita et al. 2016. Using d=2 centers the SK values at 1 for pure noise, the ideal behavior. Note -- as of now, spectrogram setigen doesn't handle single polarization spectrograms; that could change if that's a useful feature in the future.

Lastly, IMO, Eq. 10 in Nita & Gary 2002 is more accurate, especially for the 'small sample' limit we work with in our high resolution data product. The expression for the variance mu_2 tends to 2/M in the limit that N and M are large, but in some real cases they aren't -- e.g. M=16 and N=51. In this case, the O(1/M^2) term can actually contribute on order 1% or so of the variance, which is not necessarily a huge deal as an approximation, but this can make up the mathematical difference.

All in all, I'd suggest adding the d=2 in, then compare variance with 2/M. If necessary, higher order terms can also be added in, per Eq. 10 in the last point.

Here's some example code I played around with:

N = 51
d = 2
frame = stg.Frame.from_backend_params(fchans=1024*100, int_factor=N)
print(frame.shape)

frame.add_noise_from_obs()
frame.plot()
plt.show()

x = frame.data
x_sum  = np.sum(x, axis=0)
x2_sum = np.sum(x**2, axis=0)
M = x.shape[0]
SK = (N*M*d+1) / (M-1) * (M*(x2_sum / (x_sum*x_sum)) - 1).squeeze()

plt.plot(SK)
plt.show()

print(np.var(SK))
print(2/M)
print(2/M*(1+1/d/N))
# Scaling O(1/M^2) term to show it makes a difference
print(2/M*(1+1/d/N+1/N**2)+1/M**2 * 2)

bbrzycki commented 2 years ago

@telegraphic were you able to look into this in more detail? Can we close this issue?

bbrzycki / setigen

spectral kurtosis values of integrated chi-squared noise #27