Unique time-keys in `UVPSpec` objects based on `time_1_array` and `time_2_array` instead of the current `time_avg_array`

aewallwi commented 1 year ago

This PR addresses a current limitation in UVPspec class in that it addresses unique data samples with baseline pair and the average time of the two samples being crossed for a power spectrum sample. A common use case that comes up is when we cross two different times for the same baseline pair. For a concrete example, consider the baseline pair (bl_0, bl_1) where each baseline is sampled at two close-together times t1, t2. It is common practice to form interleaves between the times so that we form the baseline-time pairs, (bl_0, t1) x conj((bl_1, t2)) and (bl_0, t2) x conj((bl_1, t1)) Both of these power spectra have the same average-time (t1/2 + t2/2) and as a result, they cannot actually be stored in the same UVPSpec object. Currently pspec_run is forced to compute these interleaves from separate files and store them as different UVPSpec objects in a PSPecContainer. Unfortunately, none of the data-averaging tools in hera_pspec that rigorously track error statistics, etc, can operate on multiple UVPSpec objects.

I saw two potential solutions to this problem.

(1) Change the definition of UVPSpec to allow for time-permuted blt-pairs to be stored in the same object. (2) Change the averaging functions to operate on multiple UVPSpec objects.

This PR addresses the problem through the first option. I decided to take the first approach because

1) Option(1) will result in a simpler interfaces than option (2). interleaving in time is done routinely in power spectrum estimation and I think that supporting it at the level of the UVPSPec class will simplify our interfaces for this common task by allowing us to calculate arbitrary time-interleaves on arbitrary numbers of input data files, avoiding additional up-stream pipeline tailoring. This PR does not add such functionality to the pspecdata.pspec_run routines changing the UVPSPec definition is necessary to support this.

2) Option (1) requires less additional code then option (2). My initial hunch is that option 2 would result in more additional code since it would mean writing entire new versions of all of our averaging functions to operate over multiple objects (rather then axes in the data arrays of one of these objects). Option 1 (chosen here) focuses more on making point changes to existing code which in most cases involve swapping variable names.

We want to key our datapoints with baseline and time pairs instead of baseline pairs and a single average time. Thus, this PR changes the parameter Nblpairts to Nbltpairs.
We introduce a new variable, Ntpairs to represent the total number of unique time_1, time_2 pairs.
We change the meaning of Ntimes in UVPspec objects to reference the total number of unique times in time_1_array union time_2_array. Before, it only referenced the total number of unique times in lst_average_array.

With these changes, we can apply all of the averaging infrastructure and error bar propagation that exists for individual UVPspec objects to arbitrary time interleaves.

This PR also manages backwards compatibility with UVPSPec files created with previous versions of UVPSpec by detecting the deprecated Nblpairts variable when reading files and computing the new required variables, even if they are not present in these files from outdated versions.

codecov[bot] commented 1 year ago

Codecov Report

Patch coverage: 97.34% and project coverage change: -0.04 :warning:

Comparison is base (c34b82e) 95.98% compared to head (fef5dd8) 95.95%.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #379 +/- ## ========================================== - Coverage 95.98% 95.95% -0.04% ========================================== Files 17 17 Lines 6053 6099 +46 ========================================== + Hits 5810 5852 +42 - Misses 243 247 +4 ``` | Flag | Coverage Δ | | |---|---|---| | unittests | `95.95% <97.34%> (-0.04%)` | :arrow_down: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#carryforward-flags-in-the-pull-request-comment) to find out more. | [Impacted Files](https://app.codecov.io/gh/HERA-Team/hera_pspec/pull/379?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team) | Coverage Δ | | |---|---|---| | [hera\_pspec/plot.py](https://app.codecov.io/gh/HERA-Team/hera_pspec/pull/379?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#diff-aGVyYV9wc3BlYy9wbG90LnB5) | `92.90% <66.66%> (-0.21%)` | :arrow_down: | | [hera\_pspec/uvpspec.py](https://app.codecov.io/gh/HERA-Team/hera_pspec/pull/379?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#diff-aGVyYV9wc3BlYy91dnBzcGVjLnB5) | `97.38% <97.33%> (-0.16%)` | :arrow_down: | | [hera\_pspec/grouping.py](https://app.codecov.io/gh/HERA-Team/hera_pspec/pull/379?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#diff-aGVyYV9wc3BlYy9ncm91cGluZy5weQ==) | `97.53% <100.00%> (+<0.01%)` | :arrow_up: | | [hera\_pspec/pspecdata.py](https://app.codecov.io/gh/HERA-Team/hera_pspec/pull/379?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#diff-aGVyYV9wc3BlYy9wc3BlY2RhdGEucHk=) | `97.04% <100.00%> (-0.07%)` | :arrow_down: | | [hera\_pspec/testing.py](https://app.codecov.io/gh/HERA-Team/hera_pspec/pull/379?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#diff-aGVyYV9wc3BlYy90ZXN0aW5nLnB5) | `97.50% <100.00%> (+0.02%)` | :arrow_up: | | [hera\_pspec/utils.py](https://app.codecov.io/gh/HERA-Team/hera_pspec/pull/379?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#diff-aGVyYV9wc3BlYy91dGlscy5weQ==) | `91.18% <100.00%> (+0.42%)` | :arrow_up: | | [hera\_pspec/uvpspec\_utils.py](https://app.codecov.io/gh/HERA-Team/hera_pspec/pull/379?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#diff-aGVyYV9wc3BlYy91dnBzcGVjX3V0aWxzLnB5) | `95.66% <100.00%> (-0.04%)` | :arrow_down: | | [hera\_pspec/uvwindow.py](https://app.codecov.io/gh/HERA-Team/hera_pspec/pull/379?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#diff-aGVyYV9wc3BlYy91dndpbmRvdy5weQ==) | `98.16% <100.00%> (+<0.01%)` | :arrow_up: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

aewallwi commented 1 year ago

Thanks for the comments @philbull. I think I've addressed everything so this is ready for another look (pending the tests running) :)

aewallwi commented 1 year ago

Hi @philbull . I believe @JianrongTan will be taking over this PR. Hopefully we can get it out soon.

JianrongTan commented 1 year ago

Hi @philbull and @aewallwi! I have made some minor changes:

Delete some redundant lines.
Add more tests on backward-compatibility.
Fix the conflicts with new versions of pyuvdata.

I think the PR now is fine to be merged into the main channel.

HERA-Team / hera_pspec

Unique time-keys in `UVPSpec` objects based on `time_1_array` and `time_2_array` instead of the current `time_avg_array` #379

Codecov Report