HERA-Team / hera_qm

HERA Data Quality Metrics
MIT License
2 stars 2 forks source link

does red_corr_metrics actually measure what we think? #170

Closed jaguirre closed 4 years ago

jaguirre commented 6 years ago

The question has been raised whether average data0 * conj(data1) over time, median over frequency

corr = np.median(np.abs(np.mean(data0 * data1.conj(),axis=0)))
corr /= np.sqrt(autoPower[ant0_i, ant0_j, pol0] * autoPower[ant1_i, ant1_j, pol1])

in ant_metrics.py actually detects a failure of redundancy between baselines in a group, since it is not terribly sensitive to frequency structure, and because over a single file, the visibilities are not changing fast, much less differences between them.

Perhaps the idea was to correlate over frequency and median over time (this is similar to JEA's suggestion that |data0|/|data1| = constant, without having to worry about nulls). Assuming axis=0 is time and axis=1 is frequency, this would look like

corr = np.median(np.mean(np.abs(data0) * np.abs(data1),axis=1))
corr /= np.sqrt(autoPower[ant0_i, ant0_j, pol0] * autoPower[ant1_i, ant1_j, pol1])

To test the difference, Is there an example data set where a baseline is known not to be redundant that can be used for testing?

jsdillon commented 6 years ago

CCing @AaronParsons

ShaneBechtel commented 6 years ago

I have been working to try and understand the factors that affect the redundancy correlation for a certain antenna. To this end, I took four pairs of redundant baselines, each representing a certain orientation or distance, and compared many of their properties. From this, I discovered a somewhat surprising discovery. Take the following two graphs, showing the visibilities of both a long (84 m), vertical (North-South) pair and a long (84 m), horizontal (East-West) pair of redundant baselines in the complex plane (Note: All the data portrayed below comes from the 2458042.12552 observation with 'XX' polarization).

long_vertical_visibilities long_horizontal_visibilities

The long, horizontal baselines exhibit very similar behavior separated by a phase difference, whereas very little correlation seems to occur between the long, vertical baselines. To further show this, the plots of the phases of each pair, including the difference between them, are seen below.

long_vertical_phase long_horizontal_phase

While the later plot demonstrates a clear correlation, the former appears to have near no correlation whatsoever. This is likely due to the long, horizontal baselines experiencing more evolution over time whereas the long, vertical baselines suffer from the issue James discusses in which the difference between them are not varying enough in the small time we are observing. This seems to suggest that the baselines which are long and run East - West offer a better view of the redundancy correlation of a certain antenna. To this end, I selected a few different antenna to try and determine the effect on its redundancy from every possible baseline. I looped through every antenna, returning a value of 0 for auto-correlations and non-redundant baselines, and using the baseline produced by the chosen antenna and the current antenna in the loop, I returned the average redundancy correlation found by using the code present in ant_metrics: red_corr_metrics.

                        corr = np.median(np.abs(np.mean(data0 * data1.conj(),
                                                        axis=0)))
                        corr /= np.sqrt(autoPower[ant0_i, ant0_j, pol0] *
                                        autoPower[ant1_i, ant1_j, pol1])

From there, I plotted those values along with the actual positions of the antenna in order to see any trends that were produced. (Note: Antenna 50 is consistently marked as dead which is why it returns a lower than expected value for antenna 82). red_corr_ant1 red_corr_ant53 red_corr_ant82 red_corr_ant143

From the provided graphs, it seems that the baselines which are the most redundant are ones that are very close to one another, particularly those that run North-South. However, this does not seem to make much sense given the actual phase and visibilities that I exhibited earlier. This is quite possibly indicative of the redundancy correlation metric not actually measuring the redundancy of these redundant groups.

ShaneBechtel commented 6 years ago

Here is a link to a memo documenting the process I used to generate the above results.

https://github.com/dannyjacobs/asu_hera/blob/master/redundancy_analysis/Redundancy_testing_memo.ipynb

adampbeardsley commented 6 years ago

Adding in a couple more plots from Shane... averaging the above baselines over antenna to get a purely baseline-dependent version of the redundancy metric: 2458042.12552 image 2458098.12552 image

ShaneBechtel commented 5 years ago

I've recently implemented the possible new metric that James describes above, averaging over frequency and taking the median over time. In doing so, the dependence that the redundancy correlation had on the position/orientation of the baseline seems to disappear. Using the same method as above, though substituting in James' suggested metric where applicable, it now results in these plots.

james_metric_53 james_metric_84 james_metric_general

As one can see, the position dependence of the redundancy correlation has all but vanished, though the redundancy correlation is very high for all baselines now (much of the lower areas of redundancy are due to the effects of the dead antenna 50). In order to further examine this, I simulated data in order to see how changes to that simulated data would affect the redundancy correlation. For the simulation, I produce a signal that is the same in all baselines redundant to one another, before adding a variable noise to the data. Additionally, I add another signal, generated in the same way, to each baseline independent of its redundancies as a sort of unique signal for that baseline. This signal generation takes the delay and fringe rate into account in producing realistic data, and represents the large dependency on frequency with little evolution over time adequately (I will include a link to a memo detailing this process shortly). Using this data, I simulated 100 different sets of data with varying levels of strength for both the noise and the unique signal. For each of these 100 datasets, I average the redundancy correlations of each antenna found using James' metric and return that value for that specific dataset in order to observe how the varying levels of noise and a unique signal, which should force the redundant baselines to differ from one another, affect the redundancy correlation. In the following plot, the axes are not a linear increase. The 0th values are none of the specified aspect present, and the 4th values are when they are at parity with the redundant signal.

mean_raw_red_no_delay

As one can see, there is a drop in redundancy when both aspects are introduced, though the end correlation is still very large compared to the current metric. From here, I tried to observe the robustness of this metric by introducing an electrical delay component into the simulation, one which differed for every antenna. Doing this however, had little effect on the correlations as can be seen in the following plot.

mean_raw_red_delay

This all seems to suggest that the metric suggested by James has potential to offer a proper view of how redundant certain antennas are. The most prominent possible issue seen so far is that large changes to the redundant signal appear to only result in minor drops in redundancy correlation. This might indicate that the new metric needs to be altered in a way that results in larger changes in the correlation. More analysis should be done with this metric, but it potentially seems to show what we want to see from our redundancy correlation metric.

jsdillon commented 5 years ago

This is super interesting.

Instead of looking at the raw statistic, a much more interesting thing to look at is the modified z-score. It's OK if the metric is modified in small ways if it's still a relatively big change compared to other antennas in the same redundant group. Shane, have you looked at the z-score?

ShaneBechtel commented 5 years ago

@jsdillon, sorry for the delayed response, I've now examined how each metric behaves in regards to the z-score. I first looked into how the metrics behave with the actual HERA data, specifically the observation at Julian Date 2458042.12552. I examined the histograms produced of both the raw redundancy correlation (bottom) and the z-score of that correlation (top). When using the current redundancy correlation metric, we see the following:

norm_metric_12552_data

Here, antenna (50, X) is correctly marked as a dead antenna with a z-score of about -12. When using the metric proposed by James however, we see a similar result with a few exceptions:

james_metric_12552_data

First, antenna (50,X), while still being correctly marked as a dead antenna, has a z-score of about -45. There is also a notable tendency for the James metric z-scores to be below zero, in contrast to the current metric's distribution around zero. This is seen when the x-range is shortened.

james_metric_12552_data_zoom

What can also be seen in this zoomed in view, is that more antenna are actually below the dead antenna threshold of -5. Antennas (55,Y) and (2,Y) have z-scores of about -7 and -5 respectively (Their z-scores are almost 0 in the current metric: -0.45 and -0.14 respectively). From here, I then examined this in my simulated data, similar to the plots created above from the 100 simulation run. The process was modified such that instead of the level of the noise being varied, the strength of a "bad signal" for a specific antenna was given multiple input values. This was done by finding all the baselines that involve a specific antenna (in this case antenna 23), and then providing them an additional signal to the ones already present (those being the redundant signal common to all baselines in a redundancy and the unique signal provided to each baseline independently). Once again, the power was normalized such that this would not be flagged by a mean visibility metric. The noise was set at a constant level assuming roughly a SNR of 10. Finally, once the simulations were run, the z-score of antenna 23 was averaged over both polarizations and plotted against the strength of the "bad" signal. Multiple lines for separate unique signal levels were plotted on the same graph for comparison. This was done for both the current metric and James' metrics, and the results can be seen below starting with the current metric.

bad_antenna_test3_2 A closer view allows one to see when the z-score crosses the dead antenna threshold of -5 100_norm_sim_bant_wnoise

As a reminder, the level of the redundant signal, the unique/independent signal, and the "bad" signal are all directly comparable with the redundant signal strength being held constant at 1. This means that once the unique signal strength is larger than about half of the redundant signal, antenna 23 is incapable of being accurately and consistently flagged as a dead antenna. Now, here are the results for James' metric.

bad_antenna_test2_2 100_james_sim_bant_wnoise

Now, it is easy to see that this metric is much more sensitive to the introduction of a "bad" signal, capable of detecting it when the unique signal strength actually exceeds that of the redundant one. Before, I discussed how the raw redundancy correlation for James' metric changed only very slightly due to noise and independent signal. However, it now seems that this is due to the fact that all the antenna were affected equally. When only a few antenna behave poorly, we see that the redundancy correlation for that antenna plummets (seen in James Metric Redundancy Correlation Histogram), resulting in a very low z-score. Moving forward, it may be beneficial to see how multiple antenna with "bad" signals affect the z-scores and redundancy correlation for both metrics, as well as how James' metric behaves for other observing nights and sidereal times.

mkolopanis commented 5 years ago

Let me see if I understand this, with the suggestion from james it is more likely an antenna will be flagged if all of is data is not very redundant with respect to other antennas in its redundant group compared to the older metric?

Also as a side note, with the size of the histogram bins in your plots it is difficult to discern if there is any distribution of the james z-scores around zero. Would it be possible to use finer bin size? The fact that antenna 50's z-score is very large compared to every other score messes with the default histogram bin size.

ShaneBechtel commented 5 years ago

That is what appears to be the case. In a case where everything is equal save the metric used, it appears that an antenna with excessive nonredundant data is more likely to be flagged using James' metric than the current metric. As for the histogram, I tried offering a tighter view after the initial histogram, but I'll include a view of the distribution on a range from -5 to 5 with more narrow bin widths here.

james_metric_12552_data_zoom_1

I poorly described the behavior of the distribution in my post. A more accurate description is that the same probability is distributed more widely on the negative side than the positive side. Meaning, it is more likely for a high negative value to occur than a high positive value. I found this to possibly be significant as, it has been questioned what a very large z-score means (an antenna that is too redundant?). The fact that this metric is more biased towards larger negative numbers may be indicative of it measuring what we would like it to.

ShaneBechtel commented 5 years ago

Further looking into the z-score of bad antennas in both metrics, I examined how they respond to various amounts of antenna all performing badly at once. To do this, I slightly modified the code from above. Now, instead of unique signal strength and bad signal strength being the input values, I gave a range of bad antenna strengths along with a number specifying how many antenna would be given a "bad" signal. The unique signal was kept constant at a value of 0.5, or half the strength of the underlying redundant signal. I also gave an array of antenna from which the bad antenna were pulled for the simulation. Additionally, instead of reporting the z-score of a single antenna, it took all the z-scores that were less than -5, the dead antenna threshold, and counted them. In order to make sure that all of the antenna being flagged were in fact the "bad" antennas, I compared the antenna flagged by the metric to the input xants and only incremented the count if it matched one of them (If it didn't, I incremented a bad flag count). As stated, this was done for both the current metric and the one proposed by James, which the results of can be seen in the following plots respectively.

nm_xants_flag_count_0

Current Metric Results seen Above; James' Metric Results seen Below

jm_xants_flag_count_0

For context, the 'bad' signal strength is equivalent to that of the unique signal when it is in row 2 (the third from the top), which is why there is almost no flagging done before that row. That stated, there is a clear distinction between the two metrics in their flagging. While James' metric reliably flags most every antenna correctly when the strength of the bad signal is equivalent to that of the unique/independent signal, the current metric fails to do so. James' metric behaves as we would expect a redundancy correlation to as seen in the way that it fails to accurately flag all the antenna only when the signal is week, and a large number of antenna are set as "bad" which causes the z-score to be less reliable. This can also be seen in how the current metric drops off in the number of antenna it flags as well, although at a much lower number of antenna behaving poorly. Finally, in contrast to the consistency that James' metric provides once the 'bad' signal grows larger than both the unique and redundant signal, the current metric actually does worse at flagging antenna when it gets much larger than the redundant signal strength. I am not sure why this occurs, but I have reproduced this trend by running individual simulations with the same parameters. The last point of note is that while the current metric did not return any antenna as flagged when they were not listed in the "bad" antennas. However, James' metric returned a single antenna inaccurately flagged in the simulation with a "bad" signal strength of 4 and 4 and antenna given for the input "bad" antennas.

ShaneBechtel commented 5 years ago

In order to more clearly show how the metrics performed in the previous post, I'm including here plots which show the difference in the number of flagged antenna from the expected results (the number of all input xants times the 2 polarizations for each). When either metric performed perfectly, the plot shows a value of 0 as their is no difference from the expected result. In every other situation however, the metrics missed some amount of antenna which should have been flagged.

nm_xants_flag_count_1

Once again, Current Metric's Results seen Above; James' Metric's Results seen Below

jm_xants_flag_count_1

This makes it a bit easier to see that while James' metric returns the expected value the vast majority of the time and the current metric under performed, the current metric still did return the expected results for certain simulations when the "bad" signal strength was near the strength of the redundant signal and the number of bad antennas was low.

ShaneBechtel commented 5 years ago

While working to compare the metrics, I noticed that James' metric was taking significantly longer to run. For one of the 100 simulation runs in which the current metric would take about 4 hours, the one regarding James metric would take 6 hours. I have recently spent a bit of time looking into this and discovered that this is indeed a result of the proposed new metrics way of calculating the redundancy correlation. In comparing the visibilities of two baselines datasets, the first step in either metric is to multiply them together. In the current metric, this is done by multiplying the data from one baseline by the conjugate of the other:

corr = np.median(np.abs(np.mean(data0 * data1.conj(),axis=0)))

In James' proposed metric, this is done by multiplying the absolute values of the two datasets:

corr = np.median(np.mean(np.abs(data0) * np.abs(data1),axis=1))

It turns out that the latter of the two options takes about twice as long as the other due to additional steps being required for the full calculation (I'll include figures showing this exact time difference at the bottom of this comment). A possible attempt to shorten this time can be made by taking the abs after multiplying the datasets, changing the code to the following:

corr = np.median(np.mean(np.abs(data0 * data1),axis=1))

This does shorten the length of the calculation by about 20%. However, it still takes longer than the current metric. While the most accurate metric should be the goal for use in determining the effectiveness of the antennas, this additional time should be addressed as well lest it become an issue later on.

Here are two images showing the results of simply outputting the amount of time required for each portion of the metric, with the average time shown for each calculation done to multiply the visibilities, find the average of those products, find the median of those averages, and find the quotient of that median the square root of the autoPowers. This first image compares the Current Metric (CM) to the James Metric (JM) exactly as it was written.

time_comp_0

This next image compares the Current Metric (CM) once again to the James Metric (JM) with the change noted above (taking the abs after the multiplication of the two datasets rather than before).

time_comp_1

adampbeardsley commented 4 years ago

Closing in favor of #333