Closed RussTreadon-NOAA closed 1 year ago
Optimization changes will be committed to branch feature/optimize in RussTreadon-NOAA/GSI
scripts/exgdas_atmos_chgres_forenkf.sh
Tests on WCOSS_D find that ObsProc wall times for prepobs_prepdata and syndat_syndata increase when reading GFS v16 atmf006.nc. Part of the wall time increase is due to the increase from 64 to 127 layers. Part of the wall time increase is due to uncompressing the atmf006.nc file during the read. It was suggested to run ObsProc using an uncompressed atmf006.nc.
GFS v16 ObsProc scripts were modified toward this end. In doing so it was noted that "cp" of large atmfXXX.nc files could be replaced with "ln". These two changes were tested for 8 gfs and gdas cycles covering the period 2021012200 through 2021012318. Below are the average prep step run times for GFS v16 (para v16), and the test (test v16):
job | para v16 | test v16 |
---|---|---|
gfs prep | 03:47 | 02:45 |
gdas prep | 03:49 | 02:52 |
About 1 minute is saved for both the gfs and gdas prep steps. The prepbufr, prepbufr.acft_profiles, and nsstbufr files created by test v16 are b4b identical with their control (para v16) counterparts.
For these gains to be realized ObsProc prep scripts need to be updated to use the uncompressed atmf006.nc and "cp" needs to be replaced by "ln". These changes fall outside the scope of the NOAA-EMC/GSI repository.
Creation of the uncompressed atmf006.nc file can be done via a NOAA-EMC/GSI job. Upon examination of the workflow and job dependencies JGDAS_ATMOS_CHGRES_FORENKF was identified as a good location in which to uncompress atmf006.nc. This job runs chgres on atmfXXX.nc for XXX=003, 006, and 009. Script exgdas_atmos_chgres_forenkf.sh runs three realizations of chgres in parallel using CFP. A fourth command was added to the CFP command file. The fourth command uses NetCDF utility nccopy to uncompress atmf006.nc.
The option to uncompress atmf006.nc is controlled by script variable UNCOMPRESS_ATMF. By default UNCOMPRESS_ATMF is "NO", meaning no uncompression. If UNCOMPRESS_ATMF="YES", nccopy is executed. The uncompressed output file is suffixed ".uncompress". The locally modified ObsProc scripts pick up the ".uncompressed" version of atmf006.nc.
The average run time for job JGDAS_ATMOS_CHGRES_FORENKF is 04:21. NetCDF utility nccopy takes 01:48 to uncompress atmf006.nc. Thus, adding nccopy to JGDAS_ATMOS_CHGRES_FORENKF does not increase total job run time. However, adding nccopy to JGDAS_ATMOS_CHGRES_FORENKF does increase the job node count from 3 to 4 nodes. If using an additional node is deemed unacceptable, the details of how nccopy is implemented in exgdas_atmos_chgres_forenkf.sh can be reviewed.
The modified exgdas_atmos_chgres_forenkf.sh was committed to feature/optimize at cd49b72.
SATWND optimization
The addition of timers into GFS v16 source code file read_obs.F90 revealed that processing the satwnd dump file can take up to three minutes. This is significantly higher than the processing time for much larger radiance dump files. This finding is not surprising when one recalls that radiance dump files are processed in parallel whereas satwnd processing is serial.
The parallel paradigm used in radiance readers could be added to read_satwnd.f90. Doing so would likely require a major rewrite of read_satwnd.f90. This is not a bad thing but given the transition to JEDI, especially JEDI UFO, refactoring read_satwnd.f90 may not be the best use of DA staff time. Given this, an alternative option has been explored.
The satwnd bufr file is a collection of atmospheric motion vectors (AMVs) from various satellites and tracking algorithms - each of which is identified by a unique bufr subset. Both ObsProc and NCEPLIBS have very efficient utilities to split a bufr file into subset specific files. The ObsProc executable is named "gsb". The NCEPLIBS executable, split_by_subset.x, is described in NOAA-EMC / NCEPLIBS-bufr issue #89.
Script exglobal_atmos_analysis.sh was modified to execute split_by_subset.x on the satwnd dump file in the run time directory. The single satwnd entry in GSI namelist OBS_INPUT was replaced with multiple satwnd_NC005XXX. Thus, instead of one task reading the entire satwnd dump file, N tasks read N satwnd subset files in parallel.
The modified exglobal_atmos_analysis.sh was exercised in 8 gfs and gdas cycles covering the period 2021012200 through 2021012318. Below are the average gfs and gdas atmos_analysis run times for the control (NCO's v16 parallel) and the test:
job | para v16 | test v16 |
---|---|---|
gfs anal | 28:47 | 27:28 |
gdas anal | 38:02 | 36:27 |
Processing satwnd subsets reduced the average gfs analysis run time by 01:19. The decrease was a bit larger for the gdas analysis, 01:35.
Examination of the test and control analysis increment files found them to NOT be b4b identical. Differences were found in the initial satwnd penalties the two sets of runs. These differences were traced to application of the duplicate check in setupw.f90.
The duplicate check adjusts the observation error for duplicate observations. Duplicate observations are those observations with the same {x, y, p}. Note: if logical twodvar_regional is .true., p (pressure) is not part of the duplicate check.
When a single satwnd dump file is processed, satwnd uv is listed once in OBS_INPUT. Thus, setupw is called once for all satwnd observations. All satwnd subsets pass through the duplicate check together. Those observations with the same {x, y, p} are flagged as duplicates. Some of these {x, y, p} duplicates are for different satwnd subsets. The observation errors for all obs flagged as a duplicate are adjusted.
When satwnd subsets are processed, satwnd uv is listed once for each subset. Subroutine satwnd is called once for each subtype. Since satwnd subsets are processed separately, cross subset duplicates are not found. As a result, not all the satwnd observations flagged as duplicates in the control run are flagged as duplicates in the test. Different observation errors in the test yield different penalties, different minimizations, and ultimately different analyses with respect to the control.
As a test the duplicate check in the control setupw was modified to include the AMV observation type. Those AMV observations for which {x, y, p} are the same were NOT flagged as a duplicate unless the AMV observation type was the same. With this modification the control identifies the same satwnd duplicates as the test. This was confirmed by running this test for 2021012512 gdas.
Iliana pointed out that some satwnd subsets are not processed even though they are in the satwnd dump file. She suggested another test. The split_by_subset.x executable was used to split satwnd into subsets. The subsets not processed by the GSI were removed. The remaining subsets were concatenated in the same order as found in the original satwnd dump file. With this change the resulting analysis increments were b4b identical with the control. The GSI wall time was slightly reduced with respect to the control. Much of the wall time gain found in the satwund subset run was erased.
job | para v16 | test v16 (remove unread) |
---|---|---|
gfs anal | 28:47 | 28:31 |
gdas anal | 38:02 | 37:44 |
The changes to scripts/exglobal_atmos_analysis.sh, src/gsi/read_obs.F90, and src/gsi/setupw.f90 to run these various tests were committed to feature/optimize at cd49b72.
Note that setupw.f90 at cd49b72 has lines used to run various tests commented out. This subroutine will be cleaned up and unnecessary code removed when a final approach is decided upon.
Use Fetch upstream button on RussTreadon-NOAA/GSI github page to bring in recent commits to the authoritative NOAA-EMC/GSI repo.
Merge RussTreadon-NOAA/GSI master at 911a6a3 into feature/optimize. Done at 5f00d2a.
I ran eight cycles, assimilating once the single satwnd dump files, and once the subsets of the original satwnd files. Here are the wall time results ( in seconds) showing the time saving when assimilating subsets in parallel: date || AN hour || one file || split file || saved time 20210503 18z 1858 1763 95 20210504 00z 1789 1729 60 20210504 06z 1793 1737 56 20210504 12z 1774 1729 45
20210510 18z 1854. 1731. 123
20210511 00z 1814 1722. 92
20210511 06z 1883. 1800. 83
20210511 12z 1876. 1819. 57**
Additional ObsProc Tests
NCO contacted EMC regarding variability in global prep step job run times. Setting SYNDATA=NO
or DO_BOGUS=NO
is not an acceptable solution to reduce job run time. Setting either variable to NO
alters prepbufr
and prepbufr.acft_profiles
which, in turn, alters the analysis and the subsequent forecast.
While enhancement of program SYNDAT_SYNDATA
is a worthwhile task along with optimization of this and other ObsProc codes, another option was described at the start of this issue. Much of the increased wall time for ObsProc executables when moving from L64 nemsio to L127 compressed netcdf files is due to file processing. As documented above, reading uncompressed netcdf files decreases ObsProc executable wall time. Replacing cp
with ln
for atmfXXX.nc
files provides additional savings.
The following test was run on the production WCOSS_D:
syndata.tcvitals
file. The 2021072606 case was also run with a zero length syndata.tcvitals
to simulate a cycle with no storms. Each run reproduced its operational counterpart, except (obviously) the zero storm 2021072606 test.PREPOBS_PREPDATA
and SYNDAT_SYNDATA
were recorded for each run. Both of these programs read atmf006.nc
. The gdas.t00z.atmf006.nc
file used for each gdas prep cycle was manually uncompressed using NetCDF utility nccopy
. The following changes were made to a working copy of obsproc_prep_RB-5.4.0
:
cp
with ln -fs
cp
cat sgesprep...with
ln -fs cat sgesprep...
The v16ops config.base
was updated to point at the modified obsproc_prep and the gdas prep cases rerun. Each run generated prepbufr
and prepbufr.acft_profiles
which were bit-4-bit identical with their operational counterparts.
Tabulated below are the prep job run time (minutes:seconds) for the control (operations) and test processing uncompressed netcdf files with ln.
cycle / storms | control | test |
---|---|---|
2021072406, 3 storms | 06:04 | 04:12 |
2021072606, 2 storms | 05:52 | 03:59 |
2021072706, 1 storm | 06:04 | 04:05 |
2021072606, 0 storm | 03:34 | 02:28 |
Below is a similar table but with wall times for executables PREPOBS_PREPDATA and SYNDAT_SYNDATA .First, the wall times (seconds) for PREPOBS_PREPDATA :
cycle / storms |
control | test |
---|---|---|
2021072406, 3 storms | 141.816695 | 94.583356 |
2021072606, 2 storms | 143.482867 | 93.776878 |
2021072706, 1 storm | 144.274038 | 94.779594 |
2021072606, 0 storm | 142.240700 | 93.808766 |
Second, the wall times (seconds) for SYNDAT_SYNDATA :
cycle / storms |
control | test |
---|---|---|
2021072406, 3 storms | 143.622996 | 96.613691 |
2021072606, 2 storms | 136.511219 | 88.841422 |
2021072706, 1 storm | 132.424489 | 84.072876 |
2021072606, 0 storm | 0.0 | 0.0 |
Since the GFS no longer runs vortex relocation the gdas atm[gm3, ges, gp3].nc
files are direct copies of the previous cycle gdas atmf[003, 006, 009].nc
files, respectively. A check of operational job log files found that neither the gfs nor gdas cycle of the GFS use gdas atm[gm3, ges, gp3].nc
files. Thus, the sections of scripts/exglobal_makeprepbufr.sh.ecf
in obsproc_global_RB-3.4.0
which copy the previous cycle sg*prep
files to $COMOUT could be removed. That said, downstream applications or external users may use the atm[gm3, ges, gp3].nc
files. These users should be informed to use the previous cycle atmf[003, 006, 009].nc
files since this, in fact, is what they are currently doing.
Removing the sg*prep
copies from scripts/exglobal_makeprepbufr.sh.ecf
simplifies the script and may yield a small reduction in run time.
Here is @ShelleyMelchior-NOAA 's contribution to this work - comparing the time needed to prepare one satwnd dump file vs time for components bufr_d files. "...
Examination of GFS v16 DA job run times identified potential areas of optimization. This issue is opened to document these areas and the changes made to reduce run times.