HERA-Team / hera_cal

Library for HERA data reduction, including redundant calibration, absolute calibration, and LST-binning.
MIT License
13 stars 8 forks source link

LST-bin configuration overhaul #941

Closed steven-murray closed 3 months ago

steven-murray commented 5 months ago

This PR does a few things (sorry) in support of defining a nice notebook-based interface for LST-binning.

Bug-Fixes

  1. There was a bug in how the files were aligned for the configuration file (off by dlst/2), which has been fixed here.

LST-Bin Configuration File

  1. The file is now a HDF5 format (faster to read/write)
  2. It contains more information: it has the antpairs and pols from across the nights, and also does the calibration file and where-inpainted file matching.
  3. The config file is now backed up by a combination of two Python classes: the LSTBinConfiguration and LSTConfig classes. The first natively houses the configuration options, and is able to be directly created from a simple TOML file. It also has methods of getting all the file-configuration (i.e. finding matched files, the universal list of antpairs, etc.). The second class (LSTConfig) can be created by the first, and is more like a static container housing all the configuration (so, the matched files, lst grid etc). It doesn't do any calculations itself, but is meant to be read directly from a HDF5 file.

Binning Updates

  1. Previously, we simply had a lst_bin_files_for_baselines function which was the entry-point for binning up some given files (enabling chunking over baselines controlled by a higher-level function). This function is nice, but most of its inputs need to be constructed outside the function by reading meta-data files and doing appropriate checks (this was done in the lst_bin_files_single_outfile function). Now that we have a nice config object, it was easy to add a simpler wrapper to the lst_bin_files_for_baselines function, which is simply the lst_bin_files_from_config. There you just need to pass the config itself, and a few options, to be able to do the relevant binning (not averaging!).
  2. The output of the lst_bin_files_from_config function is a list of UVData objects (one per LST-bin in the output file). It turns out this is a nice fit for output, enabling it to be packaged up in one object.
  3. So as not to carry around extra baggage, instead of passing on a "where_inpainted" flag array from the binner, we simply set all inpainted nsamples to their negative. Since nsamples can't be negative, this sign-change acts as an extra flag that we can use when averaging.

Averaging updates

  1. The biggest update here is to expect the incoming nsamples to be negative when inpainted, and so be able to remove references to extra "where_inpainted" parameters.
  2. I also removed sigma-clipping from the averaging. The idea is that sigma-clipping (and other flagging/metrics) will be done separately before doing the final averaging.

Metrics Module

A new metrics module was added, whose aim is to calculate metrics and stats from the lst-bin outputs.

Examples

A full example of using the new code can be found in the lststack notebook in hera_notebook _templates. But to give a flavor, assume fileconf is the lst-configuration file created to keep all the alignment info. Then we do:

stackconf = LSTConfig.from_file(fileconf)

This thing has all the info from all the files, as well as metadata about how the binning was done. The most important part is that you can subselect the bins based on either an LST or output file index:

stackconf = stackconf.at_single_outfile(fileidx)
...or...
stackconf = stackconf.at_single_outfile(lst=0.5)

this will return 2 bins (by default) corresponding to a single outfile. You can get a single bin as well:

stackconf = stackconf.at_single_bin(lst=0.5)
print(stackconf.matched_files)
print(stackconf.lst_grid_edges)

To get lst-aligned data, use the very simple function:

from hera_cal.lst_stack.binning import lst_bin_files_from_config

lst_aligned_uvds = lst_bin_files_from_config(config = stackconf)

The lst_aligned_uvds is a list of UVData objects that have the format where Ntimes is Nnights (or, more generally, Nintegrations that fit into the lst bin over all nights).

codecov[bot] commented 3 months ago

Codecov Report

Attention: Patch coverage is 98.89841% with 9 lines in your changes are missing coverage. Please review.

Project coverage is 97.24%. Comparing base (f0cfd8d) to head (bacc1d0).

Files Patch % Lines
hera_cal/lst_stack/config.py 99.16% 3 Missing :warning:
hera_cal/lst_stack/wrappers.py 95.65% 2 Missing :warning:
hera_cal/datacontainer.py 94.44% 1 Missing :warning:
hera_cal/lst_stack/binning.py 99.11% 1 Missing :warning:
hera_cal/lst_stack/metrics.py 99.25% 1 Missing :warning:
hera_cal/lst_stack/stats.py 98.18% 1 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #941 +/- ## ========================================== + Coverage 97.17% 97.24% +0.07% ========================================== Files 28 30 +2 Lines 10250 10669 +419 ========================================== + Hits 9960 10375 +415 - Misses 290 294 +4 ``` | [Flag](https://app.codecov.io/gh/HERA-Team/hera_cal/pull/941/flags?src=pr&el=flags&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team) | Coverage Δ | | |---|---|---| | [unittests](https://app.codecov.io/gh/HERA-Team/hera_cal/pull/941/flags?src=pr&el=flag&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team) | `97.24% <98.89%> (+0.07%)` | :arrow_up: | Flags with carried forward coverage won't be shown. [Click here](https://docs.codecov.io/docs/carryforward-flags?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=HERA-Team#carryforward-flags-in-the-pull-request-comment) to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.