HERA-Team / hera_qm

HERA Data Quality Metrics
MIT License
2 stars 2 forks source link

Corr metric overhaul #393

Closed dstorer closed 3 years ago

dstorer commented 3 years ago

Add correlation metric and cross-polarization metric to use for antenna flagging. The correlation metric is measures how well antennas are correlating with each other, and the cross-polarization metric compares this value between the same-pols and different-pols to identify crossed antennas. See the daily notebooks for visuals of this metric and the cross-polarization metric.

The correlation metric is calculated for each polarization as follows: Screen Shot 2021-02-10 at 10 32 17 AM

When only sum visibilities are provided (e.g. for H1C), then interleaved integrations are used to calculate the evens and odds.

Additionally, we've stripped out the unneeded old functionality for Mean Vij based metrics and the cross-polarization detection based on Mean Vij. The former is superseded by auto_metrics, the latter is superseded by the new cross-correlation based detection of cross-polarized antennas. We've also removed all modified z-score based antenna removal in favor of strict cuts.

codecov[bot] commented 3 years ago

Codecov Report

Merging #393 (4e6df7e) into master (4292807) will decrease coverage by 0.06%. The diff coverage is 96.35%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #393      +/-   ##
==========================================
- Coverage   97.13%   97.07%   -0.07%     
==========================================
  Files          10       10              
  Lines        3283     3280       -3     
==========================================
- Hits         3189     3184       -5     
- Misses         94       96       +2     
Impacted Files Coverage Δ
hera_qm/ant_metrics.py 97.91% <96.15%> (-0.80%) :arrow_down:
hera_qm/metrics_io.py 92.19% <100.00%> (ø)
hera_qm/utils.py 97.19% <100.00%> (-0.07%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4292807...4947045. Read the comment docs.

dannyjacobs commented 3 years ago

Thanks @dstorer for your explanation, they made things clearer. My revised notes (here, now more? readable) try to expand on and and justify the proposed change. Things are close. I see two main questions.

Q1 In your equation 3 you write even_ij / | even_ij|. Do you really mean to divide each visibility spectrum by its abs? Is that maybe a typo? That operation would cancel the entire amplitude of the fringe. It’s hard to predict what that would look like. I was expecting you to divide by the autos. This would cancel out the gain terms to get back the “true” visibility (see my notes).

Q2 In your pasted writeup you describe a process for computing a cross-pol metric based on cross correlations then below you say that cross-polarization detection has been preceded by auto_metrics. Which is being used to ID cross pols? The method using cross poles is has many steps and summations making it difficult to predict what the output ought to be, a method that estimates the polarization fraction in the auto would be far vastly more preferable.

dstorer commented 3 years ago

@dannyjacobs in response to your 2 questions:

1: no, that's not a typo, the idea behind that choice is exactly that the amplitude of each baseline is normalized to 1, so if the phases are noise-like this value will average down to zero, but if the antennas are well correlated then the phases should not be noise-like, and this value should average to 1. In the very beginning of developing this plot I was normalizing by the autos, but found that when I normalized the way described above it did a better job at highlighting when nodes weren't correlating and made the metric somewhat less sensitive to baseline length. I agree that this is a fundamentally different metric, but what we're currently doing seems better motivated to me (and we've seen it work for a long time now).

  1. I think there was a small typo in that last sentence that made it confusing - auto_metrics is not identifying cross-polarized antennas, that is being done using the cross pol metric that is based on the correlation metric. The reason for switching away from identifying cross-polarized ants based on relative power in the autos is that we observed that method was not very robust, and was very sensitive to antennas with one polarization that was completely dead - while this is still a problem, it is a distinctly different problem than polarization cables being swapped. The new metric described above is much more reliable in catching antennas that are actually crossed, rather than low power or dead in one or both pols.

I'm not sure I 100% followed your write-up, but it seems like aside from these 2 points we are basically on the same page. Let me know if you have more questions.

jsdillon commented 3 years ago

On 1: Dividing by the amplitude has the benefit of making the RFI largely irrelevant without having to use any medians over frequency or time. Part of what I really like about this metric, now that we have auto_metrics as a complement, is that it's really focused on what information the phases of the visibilities can give us about the health of the array, compared to auto_metrics which (by necessity) only looks at amplitudes.

On 2: Regarding "The reason for switching away from identifying cross-polarized ants based on relative power in the autos" we actually used the crosscorrelations, not just the autos, to identify cross-polarized antennas before. But the MeanVij-based metrics were still just looking at visibility amplitudes, including the one for identifying crosses (which looked for larger amplitudes in en/ne than in nn/ee).

jsdillon commented 3 years ago

As a confidence-building measure (and as discussed on today's Analysis/QM telecon) I ran a whole day (2459122) through the new ant_metrics (without any a priori antenna flags). Here's a section of the result from the in-development summary notebook:

image

Looks like we're pretty consistently finding dead and/or crossed antennas. There are other pathologies which auto_metrics picks up that ant_metrics doesn't, but this is very promising.

jsdillon commented 3 years ago

We good to go on this @dannyjacobs? I'd prefer to stop running on this branch on site, if possible.