Open DavidHuber-NOAA opened 1 year ago
@DavidHuber-NOAA I want to compile the gsi using bufr/12.0.0. There's a module for that in /apps/ops/para/libs/intel/19.1.3.304/bufr. The cmake for gsi on wcoss2 uses /apps/ops/prod/libs/intel/19.1.3.304/bufr. How can I tell it to use the para tree?
@jack-woollen The GSI looks for the following environmental variables for BUFR. I would suggest adding this to gsi_common.lua in place of the load(...bufr...)
command as
prepend_path("PATH", "</path/to/bufr12>/bin", ":")
prepend_path("LD_LIBRARY_PATH", "</path/to/bufr12>/lib", ":")
prepend_path("DYLD_LIBRARY_PATH", "</path/to/bufr12>/lib", ":")
prepend_path("LD_LIBRARY_PATH", "</path/to/bufr12>/lib64", ":")
prepend_path("DYLD_LIBRARY_PATH", "</path/to/bufr12>/lib64", ":")
prepend_path("CPATH", "</path/to/bufr12>/include", ":")
prepend_path("CMAKE_PREFIX_PATH", "</path/to/bufr12>/.", ":")
prepend_path("PATH", "</path/to/bufr12>/bin", ":")
prepend_path("CMAKE_PREFIX_PATH", "</path/to/bufr12>/.", ":")
setenv("BUFR_LIB4", "</path/to/bufr12>/lib64/libbufr_4.so")
You will also need to change a few lines within the source code and one of the cmake files. Feel free to copy what I have in
/scratch1/NCEPDEV/nems/David.Huber/GSI/gsi_spackstack_b12_n492/src/gsi/read_prepbufr.f90
-- look for vtcd
and glcd
/scratch1/NCEPDEV/nems/David.Huber/GSI/gsi_spackstack_b12_n492/src/gsi/CMakeLists.txt
-- change bufr_d
to bufr_4
Thanks @DavidHuber-NOAA - works well. Good to know.
@jack-woollen I've moved this conversation over to this issue dealing with BUFR 12.
@RussTreadon-NOAA @aerorahul
I copied /lfs/h2/emc/global/noscrub/Jack.Woollen/bufrtime/bufr_v12.0.0/NCEPLIBS-bufr
over to Hera:/scratch1/NCEPDEV/nems/David.Huber/LIBS/BUFR/bufr-bufr_v12.0.0_fast
and built it with Intel 2022 and installed it here: /scratch1/NCEPDEV/nems/David.Huber/LIBS/BUFR/bufr/12.0.0_fast
. Next, I compiled the GSI with the spack-stack/1.4.1 libraries with the exception of bufr, where v12.0.0_fast was used instead (located here: /scratch1/NCEPDEV/nems/David.Huber/GSI/gsi_spackstack_b12_fast
). I then built a copy of the GSI with spack-stack/1.4.1, including bufr/11.7.0 (located here: /scratch1/NCEPDEV/nems/David.Huber/GSI/gsi_spackstack
). Finally, I ran regression tests between the two cases. All tests passed.
The regression tests have been updated and global_3dvar is no longer included in the test suite. Two new tests have been added for HAFS, both of which are impacted by the BUFR slowdown: hafs_3denvar_hybens and hafs_4denvar_glbens. I also included the rrfs_3denvar_glbens tests which also run the observer. The results of the regression tests show improvements in the global_4denvar test case, especially using a higher number of PEs (hiproc). Averaging the differences between all cases shows a 4.4s increase for hiproc tests and 22.6s for loproc tests.
My opinion is that since this seems to only affect the observer and thus won't scale up with more iterations, that this is an acceptable increase in runtime. Thoughts Russ, et al?
Test | 11.7.0 loproc Time | 12.0.0_fast loproc Time | 11.7.0 hiproc | 12.0.0_fast hiproc |
---|---|---|---|---|
global_4denvar | 381.6 | 409.6 | 325.3 | 318.0 |
hafs_3denvar_hybens | 318.2 | 330.3 | 235.3 | 252.1 |
hafs_4denvar_glbens | 362.6 | 382.6 | 266.0 | 275.7 |
rrfs_3denvar_glbens | 79.5 | 109.7 | 59.1 | 57.8 |
The wall time increases with bufr/12.0.0_fast
are not trivial, especially for rrfs_3denvar_glbens. The hafs hiproc runs also show increased run time when using bufr/12.0.0_fast
. Adding @hu5970 and @ShunLiu-NOAA for awareness.
gsi.x
is not the only application built with bufr modules. How do other bufr dependent applications perform when moving to bufr/12.0.0
?
I agree with Russ about the timing. There is more to it than we have found yet. Another half to find in fact. I have a few ideas to try next.
@DavidHuber-NOAA @jbathegit Well, I found about a half of the half of the difference left with an update in rdcmps. With this change added, the difference from the gsi observer control is reduced to +12-13s. Same comparison for running prepobs shows a difference of +7-8s. Maybe we're getting down to manageable territory. The updated code is in the same place on dogwood in /lfs/h2/emc/global/noscrub/Jack.Woollen/bufrtime/bufr_v12.0.0.
@jack-woollen I ran the regression tests between spack-stack/1.5.1 (bufr/11.7.0) and spack-stack/1.5.1 with the bufr 12 library installed here: /scratch1/NCEPDEV/global/Jack.Woollen/bufrtime/bufr_v12.0.0/build/path1
. The global_4denvar timings have improved and in fact are a little faster than 11.7.0. The HAFS timings improved significantly for the 3denvar/hiproc case, but stayed about the same for all other cases. And the RRFS timings stayed about the same. This is definitely progress, though.
Test | 11.7.0 loproc Time | 12.0.0_fast loproc Time | 11.7.0 hiproc | 12.0.0_fast hiproc |
---|---|---|---|---|
global_4denvar | 374.5 | 371.9 | 296.3 | 294.3 |
hafs_3denvar_hybens | 289.0 | 308.0 | 237.8 | 213.6 |
hafs_4denvar_glbens | 351.6 | 358.2 | 261.3 | 277.5 |
rrfs_3denvar_glbens | 77.7 | 107.0 | 55.4 | 54.4 |
@jack-woollen It's become apparent that the HAFS tests have a lot of variability in runtimes, so perhaps we should not include them in the timing tests. I am going to run the global_4denvar and rrfs_3denvar_glbens tests with bufr/11.7.0 and 12.0.0_fast a few times and compare mean runtimes at low/high PE counts.
@jack-woollen I (finally) ran the tests I mentioned above which revealed that global_4denvar when run with your BUFR optimizations has nearly the same runtimes as version 11.7.0, which is great!
The rrfs_3denvar_glbens test is still showing a slowdown with bufr/12, however. Though it is interesting that there is a lot of variation in the RRFS runtimes and suggests a bug in the RRFS DA (it reminds me of this MPI bug within the RRFS code found during the Intel 2022 upgrade).
The runtimes for the tests are attached. BUFR_Runtimes.xlsx
@DavidHuber-NOAA @jbathegit Well, I found about a half of the half of the difference left with an update in rdcmps. With this change added, the difference from the gsi observer control is reduced to +12-13s. Same comparison for running prepobs shows a difference of +7-8s. Maybe we're getting down to manageable territory. The updated code is in the same place on dogwood in /lfs/h2/emc/global/noscrub/Jack.Woollen/bufrtime/bufr_v12.0.0.
@jack-woollen I see your update to rdcmps() Thanks for coming up with that fix, and I can work to pull it over to the develop baseline. But is there anything else you want to include as well, or anything else you're still working on with @DavidHuber-NOAA that's related to these timing problems?
Sorry if I missed something, but there's been a lot of traffic and discussion on this thread and I'm just trying to understand any net changes that we need to bring over now to the library baseline. Note that I've already pulled over and merged your previous upb8() fix.
@jbathegit There is one more older change that didn't get into 12.0.0 which is in test/test_ufbrw.F90. It is in the working set on hera in /scratch1/NCEPDEV/global/Jack.Woollen/bufrtime/bufr_v12.0.0/NCEPLIBS-bufr, with the other changes. Thanks.
Thanks @jack-woollen but is there any way you could copy that test/test_ufbrw.F90 change over to somewhere on dogwood (or cactus or acorn)? I don't have an account on hera.
/lfs/h2/emc/global/noscrub/Jack.Woollen/bufrtime/bufr_v12.0.0/NCEPLIBS-bufr
Any updates on this issue?
It looks like there is a PR to make the optimizations to the BUFR library (NOAA-EMC/NCEPLIBS-bufr#543). Once merged and a new tag is created, we can request that version be installed in a future spack-stack release. And then we can go about implementing this in the GSI. I would guess this would be near the beginning of the second quarter of next year.
Thank you @DavidHuber-NOAA for the update. It's good to see that we may be able to move forward with this issue in the coming year.
@DavidHuber-NOAA I see that https://github.com/NOAA-EMC/NCEPLIBS-bufr/pull/543 has been merged but I don't see a new tag yet. Do you know of any plans for a new tagged version?
@jbathegit @AlexanderRichert-NOAA Do you know when a new tagged version for BUFR is expected?
My plan is to release a new version 12.1.0 in late May or early June.
Thanks @jbathegit!
@DavidHuber-NOAA : what is the status of this issue? Are we waiting for bufr/12.1.0
?
@RussTreadon-NOAA Yes, we are waiting on that version. When it is released, I will test it then ask for it to be installed into spack-stack 1.6.0. This will hopefully be done next month.
Thank you @DavidHuber-NOAA for the update. Good to hear that this issue is still on track. We just need to wait for bufr/12.1.0
.
@CatherineThomas-NOAA : I will add this issue to the GFS v17 milestone for tracking purposes.
BUFR 12.1.0 was released yesterday. A request to include it in the next version of spack-stack and have it installed on WCOSS2 is open https://github.com/JCSDA/spack-stack/issues/1194.
@DavidHuber-NOAA , what is the status of this issue?
BUFR 12.1.0 is being rolled out with spack-stack 1.8.0. The release candidate has been installed on Hera and official installations will be rolled out soon. I will be performing the upgrade over the next 4 weeks (hopefully sooner).
Excellent! Thank you @DavidHuber-NOAA for your diligence in upgrading GSI to bufr/12
.
A new major version of BUFR is available (version 12) and will be the default version available in spack-stack. Though version 11.7.0 can be installed on top of existing stacks, an upgrade to version 12 should be pursued. BUFR 12 installs just the BUFR_4 library. Switching to this version simply requires updating src/gsi/CMAKELISTS.txt. Additionally, the
ufbqcd
subroutine, which is used by read_prepbufr.f90 here and here, takes an integer array for the virtual temperature flag (vtcd
) in BUFR 12, as opposed to a floating point array in previous versions, and thus requires updates to these parameters in read_prepbufr.f90.Some work was performed in #589 to test BUFR 12 which showed that there is some slow down using version 12.0.0 (20-40s (~5-10%) increase for the
global_3dvar
andglobal_4denvar
test cases). This should be investigated to determine if the time difference is acceptable and, if not, work with @jbathegit and @jack-woollen to see if optimizations can be made to the library.