MPoL-dev / MPoL

A flexible Python platform for Regularized Maximum Likelihood imaging
https://mpol-dev.github.io/MPoL/
MIT License
33 stars 11 forks source link

Autocorrelation Window Functions and Spectral Covariance #18

Open iancze opened 3 years ago

iancze commented 3 years ago

Is your feature request related to a problem or opportunity? Please describe. For high sensitivity and/or spectral resolution applications, the spectral response function of the correlator system should be taken into account in the calculation of the negative log likelihood loss.

A proper understanding and treatment of this will be important for

Relevant background information Spectral resolution is achieved by calculating "lags" in the autocorrelation of the signal. Via the Wiener–Khinchin theorem, the Fourier transform of the autocorrelation function is the power spectrum. For more background, see TMS Ch 8 and Bracewell, Ch 6 "The Autocorrelation Theorem".

There are (at least) two types of correlators, "FX" and "XF," and they correspond to the order in which the Fourier transform (F) and cross correlation (X) are applied to the incoming data stream. The correlator for the ALMA main array is technically an "FXF" correlator or "filtering-correlation-Fourier transform" correlator that includes a digital filter and takes the best of both worlds, boosting spectral resolution up to a factor of 32 over a traditional XF correlator. We'll trust the comments in TMS 8.8.3 that the upgraded VLA correlator as well as the main ALMA correlator are fundamentally XF or "lag" correlators, at least at the fine resolutions corresponding to spectral line observations. In an XF correlator, the cross-correlation is done first (to get the cros-correlation signal) and then the Fourier transform is performed (to get the complex-valued cross-power spectrum (i.e., the visibility as a function of frequency)). More notes on the benefits of an XF vs FX design are available in TMS 8.8.5. The correlator for the ACA is an FX design, which operates differently. However, in practice, the spectral resolution of the ACA correlator is much higher than the 12m data, so a "profile synthesis" step is performed to match the spectral response function of the ACA data to that of the 12m data.

Because the maximum lag is finite, there will always exist a window function to the autocorrelation, whose Fourier transform manifests as a spectral response function in the power spectrum. Richard Hills' notes on spectral response contain good illustrations of this effect. The Hanning window is the default choice for ALMA main array observations, though the user can specify different options in their Phase 2 submission.

ALMA also provides an option to bin the spectrum in an on-line fashion by 2, 4, 8, or 16 channels to reduce the data rate and final data volume. While real-time data rate limits were a concern in earlier cycles, they seem to be less of an issue in current (2021) cycles. Because a Hann window applied to a Nyquist sampled data stream results in a spectrum with an effective spectral resolution (defined by FWHM) of two channels, an on-line binning of 2 results in a final set of channels that are (minimally) correlated.

Spectral resolution Much of the discussion in the ALMA documentation surrounding these choices focuses on whether the final samples are correlated. Correlation is only one concern related to forward modeling the visibility samples, however. Even if the samples are uncorrelated, we still need to implement the effect of the spectral response when bringing a latent space model forward to the data. I.e., if the channels are 244kHz wide after 2x averaging, that doesn't mean we can sample a latent space model at 244kHz and call it a day. If the latent space model contains intrinsic spectral features at 32kHz, for example, we would first need to convolve the latent space model with the effective spectral response (the result of sampling, Hann windowing, and averaging) before making a comparison with the data. This is probably easiest to achieve with a convolution in the spectral/velocity domain i.e., using the appropriately sized Hann window SRF, or the one formed by channel averaging (see example in Richard Hills' notes on spectral response). It wouldn't necessarily be 3-points long, because the 32kHz latent space model is oversampled relative to the Nyquist rate of 244kHz. This should be straightforward to implement with PyTorch Conv1D with the stride set to the output channel spacing.

Spectral covariance See TMS v3 Section 8.8.2 and Loomis+18 Appendix C for more details, but the processing of the model cube is likely just a simple 3 point digital filter that we could implement as a convolution. The covariance matrix of the data should reflect that same convolutional processing.

Describe the solution you'd like The spectral covariance loss might be implemented as inheriting from torch.nn.Module in mpol.losses, much like loss functions in PyTorch. The covariance matrix (and any Cholesky factorization or other pre-caching) would be cached as part of the __init__ process, since the matrix itself is fixed.

If the covariance is coupled with addressing the Doppler setting, we will also need to consider #4, because the act of "gridding" a dataset will destroy time information.

Describe alternatives you've considered For lower S/N datasets (and continuum datasets), ignoring this effect and using the GriddedDataset is adequate. But for high S/N applications, it'd be nice to get as much information out of the data as possible.

iancze commented 3 years ago

The following is a proposed first-order solution to the spectral response issue.

Assumptions

Procedure The procedure works similar to a simple RML loop, with BaseCube configured to have the same channel spacing as the data. Then each forward pass calculates BaseCube -> ImageCube -> FourierCube

  1. After the FourierCube step, a convolution with a Hann window is applied in the frequency domain to the model
  2. The model visibilities are selected relative to the gridded dataset, yet retain their channelization (edge case where visibilities left one cell/relative to the other)
  3. The negative likelihood loss is calculated using a covariance matrix accounting for the Hann convolution across channels

Benefits

Concerns/doubts