HERA-Team / hera-validation

Archive of formal software pipeline validation tests
http://hera.pbworks.com/w/page/130621356/Validation
0 stars 3 forks source link

Step 4.1: End-to-End for H1C IDR3 #75

Open steven-murray opened 3 years ago

steven-murray commented 3 years ago

Step 4.1: End-to-End for H1C IDR3

This will be the full end-to-end test for H1C IDR3.

With respect to Step 4.0 (H1C IDR2), this has several updated components:

The biggest difference, logistically, is that we want to simulate all days/epochs in the IDR3 dataset. This will be difficult in terms of compute/storage. Here's a plan:

Basic Flow:

  1. Ideal Vis Simulation (Diff FG, PS FG, EoR) + Fagnoni Beam + 5sec cadence + Ideal (redundant) IDR3 layout (from Josh's memo, excluding antennas always flagged, see also the a priori YAMLs)
  2. Produce Daily Datafiles for ALL days in a single EPOCH (epochs defined in Table 1 of this memo): a. Inflate ideal by non-redundancy b. Interpolate to times of single day/file c. Add noise, reflections, cross-talk d. Chunk sim and save
  3. Apply real flags to files
  4. Calibrate
  5. LST-bin the EPOCH
  6. Remove all but 1 of the daily files (probably best to keep the last day in the epoch).
  7. Pre-processing + Pspec etc. on LST-binned EPOCH
  8. Rinse and repeat for all four EPOCHS

Note that after doing a single epoch, we can fine-tune for remaining epochs. Some ideas would be to not produce ALL the days for each epoch, but instead do roughly 1/2 of the days (about 10). Note that throughout, we have N_EPOCHS=4 and N_COMBINATIONS=5.

CPU Time Estimates

  1. We don't have a good estimate of the ideal vis sim time yet (@Hugh Garsden and @jburba are working on it). Relevant notes: vis_cpu can use MPI to distribute across a lot of nodes/processors. No shared memory as yet, so that limits how many processors per node. Can easily chunk simulation on frequency axis to reduce working memory.
  2. Systematic Sims: @Bobby Pascua can provide more details, but it seems likely that I/O is the biggest bottleneck here. Can reduce that by reading in the ideal sims once and generating all days. Each day takes about ~10min of sim time. Thus, for an EPOCH (max 32 days), that is about 5 hours of CPU time (in serial), plus the IO overhead (1 hour?) = 6 hrs per EPOCH per COMBINATION.
  3. Apply flags: ?? probably negligible.
  4. Calibrate: in serial will take ~70hrs per day of observation. However, can be multitasked easily. Conservatively 2 hours of wall time per day = 60 hours per EPOCH per COMBINATION.
  5. LST-bin: ??
  6. Removing: probably negligible
  7. Pre-processing + Pspec: ??

Total wall-time estimate = IDEAL3 + N_EPOCH N_COMBINATION *(6+ 60 + PREPROCESS + PSPEC). If the latter two are negligible compared to the 60 hours for calibration, then we're looking at something like 100 hours per epoch and combination.

MAX Storage estimates

  1. 80 GB per ideal sim = 240 GB
  2. ~0.25TB/night for each night in a single epoch = 8 TB.
  3. Not sure if flagging takes any more space.
  4. Calibration files should be negligible? @joshdillon?
  5. 0.125 TB (half of the daily requirement) for each epoch and combination, as well as the LST-bin of all the epochs.
  6. Removing files obviously reduces requirements, but not the max. Since we keep one for each epoch (but NOT each combination), we should add (N_EPOCH-1)*0.25TB here.
  7. Pre-processing neligible? @joshdillon? @nkern?

So total MAX storage is 0.24 + 8 + (N_EPOCH+1)N_COMBINATION0.125 + (N_EPOCH-1)*0.25 = 9 + 3.5 = 12.5 (TB)

LONG TERM STORAGE REQUIREMENTS Long term, we'll keep all the LST-binned datasets for all combinations, and a single day for all Epochs (in one combination) and the ideal data. This should be 0.675N_COMBINATION + 0.254 + 0.25 = 1.25 + 0.675*5 = 4.5 (TB)

Why this test is required

This is the final big test to make sure everything fits together well.

Simulation Details

Criteria for Success

jaguirre commented 3 years ago

For reference, the EPOCH defining memo is http://reionization.org/manual_uploads/HERA097_H1C_IDR3_2_Memo.pdf

What is the motivation for saving one non-LST-binned file per epoch?

steven-murray commented 3 years ago

@jaguirre I think the motivation is that we want something at the raw level for each epoch (each epoch having slightly different systematic parameters). This gives us one file to look back to if problems arise that we can't figure out at the LST-binned level.