Closed junwang-noaa closed 11 months ago
@Hang-Lei-NOAA @AlexanderRichert-NOAA would you please provide the module files for HDF5 1.14.0 related libraries? Thanks
On Acorn, to use HDF5 1.12.2:
/lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.3.0/envs/unified-env-compute-hdf5-1.12.2/install/modulefiles/Core
and to use HDF5 1.14.0:
/lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.3.0/envs/unified-env-compute/install/modulefiles/Core
Add those to $MODULEPATH (module use ...
) and load the stack-intel
and stack-cray-mpich
modules as needed.
On Acorn, we do not have fix files for GSI, someone need to update the link to fix files: Please use the following modulefiles to build GSI: /lfs/h1/emc/eib/noscrub/Hang.Lei/GSI/modulefiles/gsi_wcoss2.lua /lfs/h1/emc/eib/noscrub/Hang.Lei/GSI/modulefiles/gsi_common.lua
On Tue, Apr 25, 2023 at 11:19 AM Alex Richert @.***> wrote:
On Acorn, to use HDF5 1.12.2:
/lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.3.0/envs/unified-env-compute-hdf5-1.12.2/install/modulefiles/Core and to use HDF5 1.14.0
/lfs/h1/emc/nceplibs/noscrub/spack-stack/spack-stack-1.3.0/envs/unified-env-compute/install/modulefiles/Core
and load the stack-intel and stack-cray-mpich modules as needed.
— Reply to this email directly, view it on GitHub https://github.com/NOAA-EMC/GSI/issues/563#issuecomment-1521981341, or unsubscribe https://github.com/notifications/unsubscribe-auth/AKWSMFGQ65KTEAH7D4REUCLXC7TQHANCNFSM6AAAAAAXLBFWTE . You are receiving this because you were mentioned.Message ID: @.***>
@arunchawla-NOAA -- I believe you have someone to assign to this issue, correct?
Yes. Let me get back on this
@natalie-perlin and I have made some progress on this. Starting with the branch RussTreadon-NOAA:intel2022, I updated the hpc-stack location and hdf5/netcdf versions then ran regression tests, comparing against @RussTreadon-NOAA's branch as a baseline. All hdf5/1.14.0 tests completed, but some of the hdf5/1.10.6 tests stalled and/or ran into time limits (global_3dvar, global_4dvar, and global_4denvar). Also, multiple tests produced different analysis results, which I have not analyzed in detail, but are concerning as they differ with the same hdf5/1.14.0 executable between loproc and hiproc tests (hwrf_nmm_d2
and d3
, netcdf_fv3_regional
, rrfs_3denvar_glbens
, and rtma
).
I ran similar tests on Hera and @natalie-perlin ran them on Gaea. Hera ran to completion (though I do not have the test results anymore, but will rerun them now that Hera is back up from maintenance), while Gaea crashed with hdf5/1.14.0 for the global_3dvar
and global_4denvar
tests.
Not that to run the tests with different modulefiles, I used a method described by @RussTreadon-NOAA to load the appropriate modulefiles at run time by modifying sub_jet as follows:
myuser=$LOGNAME
myhost=$(hostname)
+exp=${jobname}
+if [[ ${exp} == *"updat"* ]]; then
+ modulefiles=/mnt/lfs1/NAGAPE/epic/David.Huber/GSI/gsi_hdf5.14/modulefiles
+elif [[ ${exp} == *"contrl"* ]]; then
+ modulefiles=/mnt/lfs1/NAGAPE/epic/David.Huber/GSI/gsi_22/modulefiles
+fi
+
+
DATA=${DATA:-$ptmp/tmp}
mkdir -p $DATA
@@ -126,7 +135,7 @@ echo "" >>$cfile
echo ". /apps/lmod/lmod/init/sh" >> $cfile
echo "module purge" >> $cfile
-echo "module use $gsisrc/modulefiles" >> $cfile
+echo "module use $modulefiles" >> $cfile
echo "module load gsi_jet" >> $cfile
echo "module list" >> $cfile
On Hera, all tests pass except global_4dvar, global_4denvar, and global_3dvar:
global_4dvar fails due to different siginc files between the loproc_updat and loproc_contrl, which I will investigate further global_4denvar and global_3dvar fail due to maximum memory threshold exceedance, which are non-critical.
On further investigation, the loproc_contrl and loproc_updat siginc files generated in the global_4dvar step are slightly different sizes (39487168 vs 39483763 bytes) and appear to contain different header information, but when compared with nccmp, the data, metadata, and encoding are identical, thus I believe this is a false positive.
I found an issue in gsi-ncdiag where allocating the HDF5 chunk size when opening a netCDF file in append mode to 16GB causes maxmem failures. This is a problem with HDF5 1.14.0, but not 1.10.6. A new version of gsi-ncdiag will need to be installed on all platforms under spack-stack to resolve this issue. NOAA-EMC/gsi-ncdiag#7.
The library team is trying to update HDF5 from the current 1.10.6 to new version 1.14.0 which contains the parallel netcdf bug fixes. However the initial test GSI built with HDF5 1.14.0 failed (please see comments from George V. in https://github.com/ufs-community/ufs-weather-model/issues/1621). Could someone from GSI group to take a look at this?