Archive of formal software pipeline validation tests
This repository contains code and documentation for formal tests of the HERA software pipeline. Tests are typically performed and documented as Jupyter notebooks and are archived to provide a long-standing account of the accuracy of the pipeline as it evolves. Directory structures define the broad kinds of tests performed.
The validation group seeks to validate the HERA data pipeline software and algorithms by testing the specific software against simulations where the expected output is well understood theoretically.The group also helps to develop and define increasingly sophisticated simulations on which to build an end-to-end test and validation of the HERA pipeline.
The validation effort seeks to verify the HERA software pipeline
through a number of well-defined steps of increasing complexity.
Each of these steps (called major steps or just steps in this
repository) reflects a broad validation concern or a specific
element of the pipeline. For example, step 0 seeks to validate
just the hera_pspec
software when given a well-known white-noise
P(k)-generated sky.
Within each step exists the possibility of a set of variations (called minor variations or just variations in this repo). For example, variations for step 0 may be to generate flat-spectrum P(k) and non-flat P(k).
Finally, each combination of step-variation has the potential to incur several staged tests or trials (we call them trials in the repo).
Importantly, failing trials will not be removed/overwritten in this repo. Each formally-run trial is archived here for posterity.
Thus the structure for this repo is as follows: Under the test-series
directory, a number of directories labelled simply with their corresponding
step number are housed. Within each of these directories, each actual
trial is presented as a notebook labelled test-<step>.<variation>.<trial>.ipynb
.
All steps, variations and trials are assigned increasing numerical values. Generally, these values are increasing (from 0) in order of time/complexity.
In addition to the trial notebooks in these directories, each directory will
contain a README.md
which lists the formal goals and conditions of each of
its variations.
Finally, each variation will be represented as a specific Github project, in which the progress can be tracked and defined. Each project should receive a title which contains the step.variation identifier as well as a brief description.
We have provided a template notebook which should serve as a starting place for creating a validation notebook. The template is self-describing, and has no intrinsic dependencies. All text in the notebook surrounded by curly braces are meant to be replaced.
The template can be slightly improved/cleaned if you use jupyter notebook extensions -- in particular the ToC and python-markdown extensions. The first allows a cleaner way to build a table of contents (though one is already included), and the latter allows using python variables in markdown cells. This makes the writing of git hashes and versions simpler, and means for example that the execution time/date can be written directly into a markdown cell.
To create a simple tabulated version of the Project Plan, download the repo, save a
personal access token to a file called .pesonal-github-token
,
(ensure there is no trailing "\n" in the file)
and run make_project_table.py
at the root directory.
Note that you will need python 3.4+ and the pygithub
code to run this script (pip install pygithub
).
A semi-up-to-date version of this table is found at project_table.md.
The data for the H1C sims reported in the H1C IDR2 Validation (Aguirre et al., 2021) are available upon reasonable request. For collaboration members, the paths on the NRAO machines are listed below.
There are three main versions of the simulated data, for different levels of processing:
The simulated data has the following properties (see above linked paper for details):
RIMEz
Below, we specify for each of the three data versions where to find the data.
These reside in /lustre/aoc/projects/hera/Validation/test-4.0.0/data/visibilities/245*/
(one folder per day of mock-observation).
The files in each of these directories follow this naming convention: zen.{jd_major}.{jd_minor}.{sky_component}.{state}.uvh5
.
The {sky_component}
indicates which sky models are present in the data, and can be one of
eor
: The EoR.foregrounds
: The Foregrounds (both GLEAM and eGSM)sum
: Both EoR and ForegroundsThe {state}
indicates whether the state of the data in terms of systematics, and can be one of
true
: No systematics included (including noise)corrupt
: All systematics included (see above list)uncal
: only bandpass gains included.For example, the command ls zen.2458098.32685.*
yields:
zen.2458098.32685.eor.true.uvh5
zen.2458098.32685.foregrounds.corrupt.uvh5
zen.2458098.32685.foregrounds.true.uvh5
zen.2458098.32685.sum.corrupt.uvh5
zen.2458098.32685.sum.true.uvh5
zen.2458098.32685.sum.uncal.ref_uncal.uvh5
zen.2458098.32685.sum.uncal.uvh5
We assume most users are not interested in this step, but it is included for completeness.
These partially-processed visibilities are in
/lustre/aoc/projects/hera/Validation/test-4.0.0/pipeline/245{jd}_{kind}/
,
where {kind}
represents which data was input to the processing, and can be one of
foregrounds
: Just foregrounds (no EoR) with all systematicssum
: FG+EoR with all systematicssum_uncal
: FG+EoR with just bandpass gains but not other systematicsOther combinations of mock observations (eg. eor.corrupt
) were not processed.
Each of these folders contains a number of files. Each file corresponds to a stage/product
of processing, such as firstcal, abscal, smoothcal etc. For example, the files in
/lustre/aoc/projects/hera/Validation/test-4.0.0/pipeline/2458098_foregrounds/
are:
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.abs.calfits
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.autos.uvh5
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.first.calfits
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.firstcal_metrics.hdf5
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.flagged_abs.calfits
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.flagged_abs_vis.uvh5
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.noise_std.uvh5
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.omni.calfits
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.omni_vis.uvh5
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.smooth_abs.calfits
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.smooth_abs_vis.uvh5
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.uvh5
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.calibrated.ms
2458098_foregrounds/zen.2458098.49089.foregrounds.corrupt.calibrated.uvh5_image
The fully-processed files end with .smooth_abs_vis.uvh5
.
The final LST-binned data suitable for power spectrum analysis is in
/lustre/aoc/projects/hera/Validation/test-4.0.0/pipeline/LSTBIN/
.
Within this directory, the LST-binned data for different combinations
of input models and processing is kept in different directories, namely:
sum/
: FG+EoR, systematics applied, then calibrated out. foregrounds/
: FG-only (no EoR), systematics appplied, then calibrated out.true_eor/
: EoR-only, no systematics appliedtrue_foregrounds/
: FG-only, no systematics appliedtrue_sum/
: FG+EoR, no systematics appliedIn particular, no EoR-only with systematics was produced.
In each directory there are a great number of files. The most important files (i.e the LST-binned
and pre-processed visibilities) have the filename convention zen.grp1.of1.LST.{lst}.HH.{processing_tags}.uvh5
.
Here the lst
is a floating point number giving the LST in radians. The {processing_tags}
are
a group of single upper-case characters that indicate which processing steps have been applied. They are:
For example, doing ls LSTBIN/foregrounds/zen.grp1.of1.LST.*.HH.O*.uvh5
gives
LSTBIN/foregrounds/zen.grp1.of1.LST.0.28190.HH.OCRSLP.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.03362.HH.OCRSL.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.28190.HH.OCRSL.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.03441.HH.OCRSLPX.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.28268.HH.OCRSLPX.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.12759.HH.OCRSLP.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.28973.HH.OCRSLPXTK.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.12759.HH.OCRSL.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.28973.HH.OCRSLPXT.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.12837.HH.OCRSLPX.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.37586.HH.OCRSLP.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.22156.HH.OCRSLP.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.37586.HH.OCRSL.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.22156.HH.OCRSL.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.37665.HH.OCRSLPX.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.22234.HH.OCRSLPX.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.46983.HH.OCRSLP.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.22939.HH.OCRSLPXTK.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.46983.HH.OCRSL.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.22939.HH.OCRSLPXT.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.47061.HH.OCRSLPX.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.31552.HH.OCRSLP.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.56380.HH.OCRSLP.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.31552.HH.OCRSL.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.56380.HH.OCRSL.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.31631.HH.OCRSLPX.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.56458.HH.OCRSLPX.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.40949.HH.OCRSLP.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.60295.HH.OCRSLPXTK.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.40949.HH.OCRSL.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.60295.HH.OCRSLPXT.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.41027.HH.OCRSLPX.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.65776.HH.OCRSLP.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.50345.HH.OCRSLP.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.65776.HH.OCRSL.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.50345.HH.OCRSL.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.65854.HH.OCRSLPX.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.50424.HH.OCRSLPX.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.75173.HH.OCRSLP.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.54261.HH.OCRSLPXTK.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.75173.HH.OCRSL.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.54261.HH.OCRSLPXT.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.75251.HH.OCRSLPX.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.59742.HH.OCRSLP.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.84569.HH.OCRSLP.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.59742.HH.OCRSL.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.84569.HH.OCRSL.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.59820.HH.OCRSLPX.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.84648.HH.OCRSLPX.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.69139.HH.OCRSLP.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.91617.HH.OCRSLPXTK.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.69139.HH.OCRSL.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.91617.HH.OCRSLPXT.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.69217.HH.OCRSLPX.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.93966.HH.OCRSLP.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.78535.HH.OCRSLP.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.93966.HH.OCRSL.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.78535.HH.OCRSL.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.0.94044.HH.OCRSLPX.uvh5 LSTBIN/foregrounds/zen.grp1.of1.LST.1.78613.HH.OCRSLPX.uvh5
LSTBIN/foregrounds/zen.grp1.of1.LST.1.03362.HH.OCRSLP.uvh5
Data for all tests is not stored in this repo, but principally on the NRAO machine, at
/lustre/aoc/projects/hera/Validation/
.
Each specific test has its own directory with its associated data (eg. Validation/test-1.0.0/
).
The paths to the data within this directory for each test are explicitly referred to in the
corresponding test notebook included in this repo. In general, visibilities required for the
test will be in the visibilities/
directory, in *.uvh5
files, while power spectra will be
in the spectra/
directory, in *.psc
files. However, different tests require different hierarchies
of data, so always refer to the notebook for details.
Also, be aware that some of the files within these directories will be symbolic links to data stored for other tests. This eases the burden of storage while maintaining a logical file layout for each test.
Given the size of the data, and the infrequency of its usage, we maintain a backup of the larger data
products on the librarian storage system, and purge the files from lustre
except at need.
They are at /home/herastore02-1/Validation/
.
Even so, they are symlinked to their standard places on lustre, so they can be
accessed in the usual manner described above.