Noise in Q and U, and MAD vs theoretical

AlecThomson commented 2 years ago

Hey @Cameron-Van-Eck,

I've been looking at how best to compute errors in Q and U after RM-synthesis. I've been digging into the util_RM.measure_FDF_parms function. For this purpose, I've pulled out the guts from the code to directly play with the values, and used the following as an input (as if it were running in do_RMsynth_1D.py:

QArr = np.ones(288)
UArr = np.zeros(288)
IArr = np.ones(288)
dQArr = np.ones(288) * 0.1
dUArr = np.ones(288) * 0.1
dIArr = np.ones(288) * 0.1
QArr += np.random.normal(loc=0, scale=0.1, size=288)
UArr += np.random.normal(loc=0, scale=0.1, size=288)
IArr += np.random.normal(loc=0, scale=0.1, size=288)
fit_function='log'
verbose = True
fitRMSF = True
debug = False
polyOrd = 2
log = print
C = 2.997924538e8 # Speed of light [m/s]
freqArr_Hz = np.linspace(744, 1032, 288) * 1e6
modStokesI = np.ones_like(freqArr_Hz)
nBits = 32
dtFloat = "float" + str(nBits)
dtComplex = "complex" + str(2*nBits)
phiMax_radm2 = 1000
dPhi_radm2 = None
nSamples = 10
weightArr = np.ones_like(freqArr_Hz)

Here's the result of RMsynth:

realFDF = FDF.real
imagFDF = FDF.imag
plt.plot(phiArr, realFDF, label="Stokes Q")
plt.plot(phiArr, imagFDF, label="Stokes U")
plt.plot(phiArr, absFDF, label="PI")
plt.legend()
plt.xlabel("$\phi$ [rad/m/m]")
plt.ylabel("Flux density [fake units]")

Here is what the MAD, rms, and theoretical values for the FDF noise came out to be:

print(f"{dFDFcorMAD=}", f"{dFDFrms=}", f"{dFDF=}")
# dFDFcorMAD=0.014828366 dFDFrms=0.034472786 dFDF=0.005892556509887897

Now I'll subtract the rmsf from the centre - i.e. a gain 1, n 1 RM-CLEAN but dumb. Now we can compare this residual to the noise estimators

rmsf_crop = RMSFArr[len(RMSFArr)//4+1:3*len(RMSFArr)//4]
for noise, name in zip((dFDFcorMAD, dFDFrms, dFDF), ("MAD", "RMS", "theory")):
    plt.figure()
    plt.plot(phiArr, realFDF - rmsf_crop.real, label="Stokes Q")
    plt.plot(phiArr, imagFDF - rmsf_crop.imag, label="Stokes U")
    plt.plot(phiArr, absFDF - np.abs(rmsf_crop), label="PI")
    plt.fill_between(phiArr, -noise, noise, alpha=0.2, label=name, color='k')
    plt.hlines(-0.01, phiArr[0], phiArr[-1], linestyles='dashed', color='k')
    plt.hlines(0.01, phiArr[0], phiArr[-1], linestyles='dashed', color='k', label="Input noise (frequency)")
    plt.ylim(-0.05, 0.05)
    plt.legend()
    plt.xlabel("$\phi$ [rad/m/m]")
    plt.ylabel("Residual flux density [fake units]")

Noting here that I've plotted the ±0.1 levels (the 1sigma noise in frequency), but I don't think we should see that exactly in the FDF, right?

Now as a sanity check, I input a pure noise signal:

QArr = np.random.normal(loc=0, scale=0.1, size=288)
UArr = np.random.normal(loc=0, scale=0.1, size=288)
IArr = np.random.normal(loc=0, scale=0.1, size=288)
dQArr = np.ones(288) * 0.1
dUArr = np.ones(288) * 0.1
dIArr = np.ones(288) * 0.1

Running the same as above, but not doing the dumb CLEAN:

print(f"{dFDFcorMAD=}", f"{dFDFrms=}", f"{dFDF=}")
# dFDFcorMAD=0.003147664 dFDFrms=0.0074483543 dFDF=0.005892556509887897
for noise, name in zip((dFDFcorMAD, dFDFrms, dFDF), ("MAD", "RMS", "theory")):
    plt.figure()
    plt.plot(phiArr, realFDF , label="Stokes Q")
    plt.plot(phiArr, imagFDF , label="Stokes U")
    plt.plot(phiArr, absFDF, label="PI")
    plt.fill_between(phiArr, -noise, noise, alpha=0.2, label=name, color='k')
    plt.hlines(-0.01, phiArr[0], phiArr[-1], linestyles='dashed', color='k')
    plt.hlines(0.01, phiArr[0], phiArr[-1], linestyles='dashed', color='k', label="Input noise (frequency)")
    plt.ylim(-0.05, 0.05)
    plt.legend()
    plt.xlabel("$\phi$ [rad/m/m]")
    plt.ylabel("Noise flux density [fake units]")

My takeaways:

In the presence of a signal, the MAD and RMS estimators seem to overestimate the noise. In my test case here, the was a factor of 3 between the MAD and the theoretical value.
Confusingly, for a pure noise spectrum the three estimators disagreed.
The theoretical noise value seems like a good 1sigma value for Stokes Q and U. Is this expected? If so, I would suggest we use it to output uncertainty in Q and U. i.e. we have peakFDFimagChan, peakFDFimagFit etc., we should also provide dPeakFDFimagChan, dPeakFDFimagFit etc.

P.S. We could also adapt the RMS/MAD estimators to run on the real/imaginary spectra quite easily too, if we trust their outputs.

Cameron-Van-Eck commented 2 years ago

Hi @AlecThomson:

These results are pretty in-line with similar simulations I've done. It's definitely known/expected that uncleaned FDFs will report higher measured noise, as a result of sidelobes of the real signal. After many discussions with Bryan about this, we've essentially concluded that the measured noise in dirty FDFs should never be used. The theoretical noise seems to be well behaved and accurate, assuming the channel uncertainties are accurate (as they are in these sims).

I'm not sure why the estimators differ so much on the pure-noise FDFs. I was under the impression that the MADFM at least was reliable if there was no signal. The RMS I don't particularly expect to be well behaved when computed using the absolute value/PI, because of polarization bias. My policy is to never use/trust the RMS under any circumstance, and I've been debating just flat out removing it from RM-Tools entirely.

I think it is expected that the theoretical noise should be appropriate for Stokes Q and U as well as PI (in the high S:N regime). I've seen/done some derivations that essentially show that the noise in PI is equal to the noise in Q/U once you're at a decent S:N (>~5?) so that polarization bias becomes less of a problem.

Running the noise estimators in Q and U rather than PI is probably a good idea, I think (in my head, I half-thought they already were). The RMS at the very least should be better behaved.

The only really surprising thing for me is the difference between the MAD and the theoretical noise for the pure noise case. Is it perhaps not normalizing the MAD to the Gaussian 1-sigma equivalent? There's a conversion factor of 0.6-something, and so I wonder if that would make the difference.

To summarize my own conclusions on the subject from similar work:

we can't reliably extract noise values from dirty FDFs. Even the windowing done by measure_FDF_parms isn't enough (although it helps).
theoretical noise works pretty well in sims, where we know exactly the actual noise amplitude. Harder to know for sure if we can estimate the channel noise accurately on real data, but Max Fine's work on injecting simulated sources into POSSUM data were pretty encouraging.
Q/U and PI errors are approximately interchangeable, as long as your estimator doesn't assume zero mean (i.e., as RMS does).

Cheers, Cameron

Cameron-Van-Eck commented 1 year ago

Adding a note here after a discussion with Dave McConnell: there's probably an inconsistency in how we're defining the noise: the theoretical noise is, I think, nominally for Q or U (of the FDF), while the empirical noise is calculated as the MADFM of the PI (of the wings of the spectrum, which would nominally be noise-like for Cleaned spectra.

I should probably fix that to be consistent -- it would be easy to modify the MAD and RMS calculations to operate on Q and U. I'll add this to my list of bugs to fix before finishing the paper...

Cameron-Van-Eck commented 10 months ago

This should be fixed now. I removed the RMS noise estimate before, because it was easily biased high by outliers (like actual signal). The corrected-MAD noise estimate was flawed: it calculated the noise of polarized intensity, which has very non-Gaussian statistics, so it ended up biased low. I've changed it to calculate the MADFM of the real and imaginary components together. This reproduces the correct noise amplitude on simulations, and is reflective of the actual error in PI.

This produces the correct results on both signal-free simulations and Faraday-simple RM-cleaned simulations. Dirty FDFs still result in noise estimates that are biased high (albeit not by too much, in most cases), but that's inescapable -- for dirty FDFs, the theoretical noise may generally be more reliable.

CIRADA-Tools / RM-Tools

Noise in Q and U, and MAD vs theoretical #63