Investigate slow runtime of ensemble_stat

dwfncar commented 7 years ago

From Michelle Harrold...

Jamie and I have a packaged up example for running ensemble-stat under my scratch space on Yellowstone. We ran a brief test today on one forecast hour to make sure the scripts, data, and config files are all in place. Given the example we ran, here are some of the details:

-- The timing issue manifests itself when running ensemble-stat for upper-air variables
-- Our setup has 6 variables to verify (TMP, DPT, HGT, UGRD, VGRD, WIND) for 5 vertical levels and 5-6 thresholds depending on the variable
-- To run the 5 levels and multiple thresholds, it takes ~45m - 1h for each variable. This means for 6 variables, it takes about 5-6 hours to run for one forecast lead time

Our example work space is here:
/glade/scratch/harrold/re_task/stoch

The korn shell script to run all variables in one task is here (what is ideal for our workflow):
/glade/scratch/harrold/re_task/stoch/script/met_ensemble_verf_point_upa_manual.ksh

The korn shell script to run one variable at a time is here (similar to how we need to run in our current workflow):
/glade/scratch/harrold/re_task/stoch/script/met_ensemble_verf_point_upa_byVAR_manual.ksh

A simple bsub script to submit met_ensemble_verf_point_upa_manual.ksh to the queue:
/glade/scratch/harrold/re_task/stoch/script/run_met.ksh

Output from the workflow (ensemble-stat output and obs needed for vx):
/glade/scratch/harrold/re_task/stoch/2016051800/production_run

Individual member post data:
/glade/scratch/harrold/re_task/stoch/memX/run/2016051800/postprd

I am sure I left something out, so just let us know if you have any follow-up questions! [MET-771] created by johnhg

dwfncar commented 7 years ago

Some notes on this issue:
(1) I'm running a single case of ensemble-stat with all the variable/levels on yellowstone.
(2) The HRRR grid is very dense: 1799 x 1059
(3) There's a lot of "wasted" time reading redundant vector fields.
  - Read U... and then read V to rotate grid to earth.
  - Read V... and then read U to rotate grid to earth.
  - Derive Wind... Read U and then read V to rotate grid to earth. Read V and the read U to rotate grid to earth.
  - Suggest simplifying this logic. Perhaps use "uv_index" to refer to the uv pair.
(4) Spend a lot of time reading data for "ens" and then double it to read data for "fcst". When processing "fcst" check to see if we've already read it for "ens" and then copy.
(5) It really slows down to a halt here:

DEBUG 2: Processing ensemble field: TMP/P925
DEBUG 3: Switching the GRIB2 radius of the earth value of 6371.23 km to 6371.2 km for internal consistency.
DEBUG 3: MetGrib2DataFile::data_plane() - Found exact match for 'TMP/P925' in GRIB2 record 492 field 1 of GRIB2 file 'members/mem0.grib2'

Suggest running a single variable/level and running through the debugger. by johnhg

dwfncar commented 7 years ago

With help from Randy, we reduced the runtime of Michelle and Jamie's test case down from 6 hours (on yellowstone) to about 12 minutes (on my desktop). There are some small differences in the computed statistics... but I'll address that as part of reconciling the nightly build differences.

We sped it up in two ways:
(1) Rather than calling NumArray::add() to get up to the grid dimension, call NumArray::extend() to set the required size in the beginning. Replace calls to NumArray::clear() with NumArray::erase().
(2) Updated logic in track_counts(). Rather than using helper function to access the data arrays, operate on the data buffers directly. This is a bit dangerous since we're skipping the bound checking, but it sped it up considerably. by johnhg

dtcenter / MET

Investigate slow runtime of ensemble_stat #771