Cal fix and speedup - Githubissues

isullivan commented 3 years ago

Combining commits from three branches. From @bhazelton: One of them addresses Mike’s issue above (edge_source_fix), one fixes a cable reflection fitting bug (cable_reflection_fix) and one implements Ian’s new calibration speedup (calibration_speedup).

mwilensky768 commented 3 years ago

I confirm that the bug I mentioned in #243 does not happen on this branch.

mwilensky768 commented 3 years ago

Here is the difference power spectrum for master at a2dec7 and the PR at 4c5f58. The sense of subtraction is master minus cal_fix_and_speedup.

_noimgclip_dft_averemove_swbh_dencorr__master_1094485088_minus_PR_1094485088_2dkpower

nicholebarry commented 3 years ago

Does this include the "Revert to orig sign convention in exp and match" commit from 3 days ago?

mwilensky768 commented 3 years ago

Yes.

nicholebarry commented 3 years ago

Okay, cool. So I assume you ran with some form of pre-Wenyang calibration, in which case the cable reflection was better accounted for in this PR. Would it be possible for you to run with auto-ratio calibration for both as well (just to remove the contribution of the cable reflection)?

Also, can you post cal structures and/or plots on the differences of the convergence levels and overall "raw" gains (i.e. per-freq gains directly from the subroutine prior to fitting). Also, what was the time/mem results for each test?

mwilensky768 commented 3 years ago

Yes, pre-Wenyang cal. Will get to work on the other plots/metrics.

mwilensky768 commented 3 years ago

The timing/memory for 10 cores of an r.8xlarge instance on AWS is as follows

master: 63G RAM, 223 min PR: 63G RAM, 180 min

I think there is an underflow issue cropping up with the cal convergences which is making it seem like 3/4 of them are identically 0. It does not match the flags read in by a UVCal object, and the cal plots for the obs look too sane for on average 3/4 of the band per antenna being flagged. Will post plots representing this thinking on request.

nicholebarry commented 3 years ago

I've tested this is in both simulation and data. They both tell a very similar story, so I'll show just simulation for simplicity.

Simulation specifics: Input visibilities have ~50,000 sources and a smooth beam, and the model has far less sources and the fft of the instrumental beam. The simulation is allowed to fit a calibration amp and phase per frequency (the classic overfitting scenario).

There are three types of tests here:

Original linear least squares fitting
This updated branch, but still using the original linear least squares fitting.
This updated branch with the new adaptive gain fitting technique.

In order to test the original least squares fitting (1), I've just reverted the subroutine. Thus, I really am only comparing the fitting technique. The updated branch with original linear least squares fitting (2) is not quite the same as (1) because there is a smarter calculation of the threshold criteria.

First, a test on how many iterations till the threshold criteria is reached for all tiles and for one frequency channel. Black lines are each tile's convergence in that iteration. Green is the mean over all tiles. The first 4 iterations fit for just the phase, hence the discontinuity.

Original linear least squares fitting
This updated branch, but still using the original linear least squares fitting.
This updated branch with the new adaptive gain fitting technique. Takeaways: The smarter calculation of the threshold criteria (2) is better than user-input threshold criteria (1). Oddly enough, the adaptive gain (3) converged slower.

Second, a test on what the converged result was for both amp and phase. The changes are too small to see by eye, so I am comparing the differences. Black is each tile's difference, and green is the standard deviation over all tiles.

Amplitude: This updated branch, but still using the original linear least squares fitting (2) MINUS Original linear least squares fitting (1)
Amplitude: This updated branch with the new adaptive gain fitting technique (3) MINUS This updated branch, but still using the original linear least squares fitting (2)
Phase: This updated branch, but still using the original linear least squares fitting (2) MINUS Original linear least squares fitting (1)
Phase: This updated branch with the new adaptive gain fitting technique (3) MINUS This updated branch, but still using the original linear least squares fitting (2) Takeaways: The differences are close to floating point, but there seem to be a few outliers, particularly when comparing against the adaptive gain technique (3).

Third, the power spectra difference.

Original linear least squares fitting (1) MINUS This updated branch, but still using the original linear least squares fitting (2)
This updated branch, but still using the original linear least squares fitting (2) MINUS This updated branch with the new adaptive gain fitting technique (3) Takeaways: the power spectrum difference is very small (EoR level) and seems to get better with each test in the window: (2) is better than (1), (3) is better than (2). It's unclear to me why it is better, perhaps @isullivan will have a theory.

One final note: The convergence vs iteration test I showed above is just for one frequency. In fact, not all frequencies converge. The number of converged frequencies is different between each test. Again, this is simulation with no flagged frequencies.

Original linear least squares fitting: 0 (xx) and 0 (yy) because I set the convergence threshold too low.
This updated branch, but still using the original linear least squares fitting: 384 (xx) and 383 (yy)
This updated branch with the new adaptive gain fitting technique: 378 (xx) and 377 (yy)

Therefore, it seems as though the new convergence threshold calculation in (2) is a win, and that the adaptive gain fitting technique (3) is promising but may need tweaking. I will look into the failed frequencies to see if I can't pinpoint why.

bhazelton commented 3 years ago

This is fascinating! Thanks for looking into it so thoroughly @nicholebarry!

isullivan commented 3 years ago

This is a great analysis @nicholebarry! A few comments: The initial blip around iteration 5 is when we switch from only fitting the phase to the full gain calculation. I'm very happy to see how the new threshold is working, that convergence plot looks great! Since these are simulations, I assume the large differences in the final amplitude and phase plots for (3) around 185 and 195 MHz are unphysical, right? I am confused about the power spectrum improvement in this case. Are these the frequencies that failed to converge, maybe? I suspect the oscillations in the convergence plot are prematurely triggering the end of calibration, which suggests that the maximum gain is allowed to be too aggressive. I also suspect that these oscillations may be the reason (3) has fewer converging frequencies, since there may be a jump up that exits calibration just above the threshold used to indicate a converging or diverging solution.

nicholebarry commented 3 years ago

Three issues:

The base gain is defaulted to 1 in the calibration init procedure. The adaptive gain procedure then checks to see if base gain is set, and if not, it is set to 0.5. This defaulting in the adaptive gain procedure will never happen due to the prior default in the init. Rerunning tests with the base gain set to 0.5 increases speed, but also increases the number of diverged frequency channels. Here is an example frequency convergence using a base gain of 0.5. And of an example of frequency convergence using a base gain of 0.75 (in other words, slower but steadier convergence). For this frequency, it actually converged faster than the base gain of 0.5, but if you look closely, it has a slightly flatter slope.
Convergence checking begins where IF i GT phase_fit_iter THEN BEGIN where i is the iteration and phase_fit_iter is the number of iterations of fitting just the phase. I suggest changing this to IF i GT phase_fit_iter+divergence_history THEN BEGIN where divergence_history is number of prior iterations to compare to in order to see if the current iteration is diverging. Since there is an expected discontinuity in the convergence once amplitudes are included, some of the divergence calculations are being skewed. This solves approx 2 to 4 diverged frequency channels which failed converging at 1e-2 when it should have reached 1e-7.
Adding an additional logic condition reduces a lot of the "diverged channels". In reality, hitting a threshold of 1e-7 (default) is extremely hard, as showcased is the original convergence plot (first plot above) which never truly converged. We are converging (for the most part) with the update because there is a hard condition (max deviation) and a soft condition (mean deviation) and if we start to diverge, we check to see if the soft condition has been met and call it good. However, we don't check to see if the previous iteration, the one before the start of divergence, met the soft condition if the current iteration has not. I've added logic to check that and back up the iteration if it has.

Incorporating these changes, here are the number of converged channels and mean iterations till convergence for:

0.5 base gain: 381 (xx), 377 (yy), 32.6 mean iterations till convergence
0.75 base gain: 384 (xx) 384 (yy), 33.6 mean iterations till convergence

Given this, I'm pushing these changes in addition to defaulting the base gain to 0.75.

EoRImaging / FHD

Cal fix and speedup #245