Open RussTreadon-NOAA opened 1 month ago
This issue is an EE2 compliance issue being tracked on GDASApp issue #1254
As a test make the following modification to sorc/build_gdas.sh
in a working copy of g-w branch feature/radbcor
(see g-w PR #2875)
@@ -24,6 +24,6 @@ shift $((OPTIND-1))
# shellcheck disable=SC2086
BUILD_JOBS="${BUILD_JOBS:-8}" \
WORKFLOW_BUILD="ON" \
-./gdas.cd/build.sh ${_opts} -f
+./gdas.cd/build.sh ${_opts} -f -p /work/noaa/da/rtreadon/git/global-workflow/radbcor/bin
exit
Currently executing build_gdas.sh
on Hercules. Install is proceeding very slowly. Is this expected @danholdaway ? Did you simply add -p $INSTALL_PATH
to build_gdas.sh
in your test?
@RussTreadon-NOAA in my test I modified the build script as I believe the logic is flawed when you choose install. Here is the logic:
# Build
echo "Building ..."
set -x
if [[ $BUILD_JCSDA == 'YES' ]]; then
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
else
builddirs="gdas iodaconv land-imsproc land-jediincr gdas-utils bufr-query"
for b in $builddirs; do
cd $b
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
cd ../
done
fi
# Install
if [[ -n ${INSTALL_PREFIX:-} ]]; then
echo "Installing ..."
set -x
make install
set +x
fi
Note that the script loops over the builddirs
and makes all the code associated with these packages. But once it's done with that there is still a very large amount that remains unbuilt, e.g. ufo tests, saber tests, oops toy model etc. That final make install
will make all that remaining code but it will only do it with one processor. If you want to do install you have to make absolutely everything, there is no two ways about it. Otherwise you won't copy the libraries to the install path.
For quicker build with install it needs to be something like:
# Build
echo "Building ..."
set -x
if [[ -z ${INSTALL_PREFIX:-} ]]; then
if [[ $BUILD_JCSDA == 'YES' ]]; then
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
else
builddirs="gdas iodaconv land-imsproc land-jediincr gdas-utils bufr-query"
for b in $builddirs; do
cd $b
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
cd ../
done
fi
fi
set +x
# Install
if [[ -n ${INSTALL_PREFIX:-} ]]; then
echo "Installing ..."
set -x
make -j ${BUILD_JOBS:-6} install
set +x
fi
I.e. skip the builddirs since they will be built anyway and let CMake figure out an optimal parallel strategy for building them. However note that it will always be a lot slower than the normal way of building because 1000s of tests will be built.
Relooking at it you could combine BUILD_JCSDA with INSTALL_PREFIX. Perhaps that was the original intention?
Thank you @danholdaway for explaining why install is taking so much time. I'm executing faulty logic. Let me try your suggestions. Another item on our TODO list is turning off all the JEDI tests (GDASApp issue #1269
Tested the suggested change to build.sh
. Build with INSTALL_PREFIX
set is progressing very slowly (now approaching 3 hours). Will revisit next week.
In that case it might be a prerequisite that the building of the JEDI tests can be turned off. We could proceed with this relatively easily but it's a feature that has met resistance in the past. I think some of that resistance comes from a difference in working styles. Those opposed argue that the cost of building JEDI is in the noise of the time it takes to run an experiment so why introduce the extra flag and complicating of the build infrastructure, which is already complicated.
One idea I was playing around with before I went on leave is whether we could try to eliminate the need to build JEDI for most people. Then the issue might go away. I think it would have to be done in tandem with implementing more of a process for updating the JEDI hashes. Perhaps we could brainstorm this week?
The Hercules build with INSTALL_PREFIX
set eventually failed with
-- Installing: /work/noaa/da/rtreadon/git/global-workflow/radbcor/bin/share/soca/testdata/72x35x25/INPUT/ocean_hgrid.nc
-- Installing: /work/noaa/da/rtreadon/git/global-workflow/radbcor/bin/share/soca/testdata/72x35x25/INPUT/ocean_topog.nc
CMake Error at soca/test/cmake_install.cmake:65 (file):
file INSTALL cannot find
"/work/noaa/da/rtreadon/git/global-workflow/radbcor/sorc/gdas.cd/bundle/soca/test/Data/rossrad.dat":
No such file or directory.
Call Stack (most recent call first):
soca/cmake_install.cmake:84 (include)
cmake_install.cmake:122 (include)
make: *** [Makefile:130: install] Error 1
File rossrad.nc
replaced rossrad.dat
. However, rossrad.dat
is still referenced in three soca
files.
First, gdas.cd/sorc/soca/test/CMakeLists.txt
has
set( soca_install_data
Data/rossrad.dat
Data/godas_sst_bgerr.nc )
install(FILES ${soca_install_data}
DESTINATION ${INSTALL_DATA_DIR}/testdata/ )
The two other files that reference rossrad.dat
are
gdas.cd/sorc/soca/tutorial/tutorial_tools.sh: ln -sf $datadir/Data/rossrad.dat .
gdas.cd/sorc/soca/.gitattributes:test/Data/rossrad.dat filter=lfs diff=lfs merge=lfs -text
If the install option for the GDASApp build satisfies the EE2 executable requirement, then we should try to find a way to speed up the build with install.
We don't need JEDI ctests when building and installing GDASApp for use in operations. If turning off JEDI ctests speeds up the build and install for operations, we should figure out a way to make this happen.
@danholdaway , I agree with your brainstorming idea. The core GDASApp infrastructure team needs to develop a plan to work through the various items from the bi-weekly JEDI workflow sync meeting with EIB.
Work for this issue will be done in feature/install
@danholdaway recommended making the following change in CMakeLists.txt
for the various JEDI submodules in sorc/
option( ENABLE_JEDI_CTESTS "Build JEDI ctests" ON )
if( ENABLE_JEDI_CTESTS )
add_subdirectory( test )
endif()
The default behavior is for JEDI ctests to be active (ON
). Users can add -DENABLE_JEDI_CTESTS=OFF
to the cmake configure to turn off JEDI ctests.
feature/install
was cloned on Hercules in /work/noaa/da/rtreadon/git/GDASApp/install/
. The above scripting was added to CMakeLists.txt
in the following JEDI submodules
modified: sorc/bufr-query (modified content)
modified: sorc/crtm (modified content)
modified: sorc/fv3-jedi (modified content)
modified: sorc/gsibec (modified content)
modified: sorc/ioda (modified content)
modified: sorc/iodaconv (modified content)
modified: sorc/saber (modified content)
modified: sorc/soca (modified content)
modified: sorc/ufo (modified content)
modified: sorc/vader (modified content)
build.sh
was modified as follows
@@ -24,6 +24,7 @@ usage() {
echo " -f force a clean build DEFAULT: NO"
echo " -d include JCSDA ctest data DEFAULT: NO"
echo " -a build everything in bundle DEFAULT: NO"
+ echo " -j build with JEDI ctests DEFAULT: OFF"
echo " -h display this message and quit"
echo
exit 1
@@ -39,6 +40,7 @@ BUILD_VERBOSE="NO"
CLONE_JCSDADATA="NO"
CLEAN_BUILD="NO"
BUILD_JCSDA="NO"
+ENABLE_JEDI_CTESTS="OFF"
COMPILER="${COMPILER:-intel}"
while getopts "p:t:c:hvdfa" opt; do
@@ -64,6 +66,9 @@ while getopts "p:t:c:hvdfa" opt; do
a)
BUILD_JCSDA=YES
;;
+ j)
+ ENABLE_JEDI_CTESTS=ON
+ ;;
h|\?|:)
usage
;;
@@ -98,6 +103,10 @@ mkdir -p ${BUILD_DIR} && cd ${BUILD_DIR}
# If INSTALL_PREFIX is not empty; install at INSTALL_PREFIX
[[ -n "${INSTALL_PREFIX:-}" ]] && CMAKE_OPTS+=" -DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}"
+# Activate JEDI ctests if requested
+ENABLE_JEDI_CTESTS=${ENABLE_JEDI_CTESTS:-"OFF"}
+CMAKE_OPTS+=" -DENABLE_JEDI_CTESTS=${ENABLE_JEDI_CTESTS}"
+
# activate tests based on if this is cloned within the global-workflow
WORKFLOW_BUILD=${WORKFLOW_BUILD:-"OFF"}
CMAKE_OPTS+=" -DWORKFLOW_TESTS=${WORKFLOW_BUILD}"
./build.sh
was then executed. ctest -N
was executed in build/
upon completion. 238 tests remain. Prior to this change there were 1952 ctests. Below is the list of remaining ctests
Test project /work/noaa/da/rtreadon/git/GDASApp/install/build
Test #1: gsw_poly_check
Test #2: gsw_check_functions
Test #3: bufr_query_coding_norms
Test #4: oops_coding_norms
Test #5: test_oops_base_dummy_run_one
Test #6: test_oops_base_dummy_run_no_validate
Test #7: test_oops_base_dummy_run_validate_zero
Test #8: test_oops_base_dummy_run_bad_arg_zero
Test #9: test_oops_base_dummy_run_bad_arg_one
Test #10: test_oops_base_dummy_run_bad_arg_two
Test #11: test_oops_base_dummy_run_bad_arg_three
Test #12: test_oops_base_dummy_run_help
Test #13: test_oops_base_dummy_run_h
Test #14: test_oops_base_variables
Test #15: test_oops_base_obsvariables
Test #16: test_base_posttimer
Test #17: test_util_signal_trap_fpe_div_by_zero
Test #18: test_util_signal_trap_fpe_invalid_op
Test #19: test_util_signal_trap_fpe_valid_op
Test #20: test_util_stacktrace
Test #21: test_util_random
Test #22: test_util_pushstringvector
Test #23: test_util_parameters
Test #24: test_generic_atlas_interpolator
Test #25: test_generic_unstructured_interpolator
Test #26: test_generic_atlas_global_interpolator
Test #27: test_generic_unstructured_global_interpolator
Test #28: test_generic_unstructured_global_interpolator_parallel
Test #29: test_generic_gc99
Test #30: test_generic_soar
Test #31: test_coupled_splitvariables
Test #32: test_util_isanypointinvolumeinterior
Test #33: test_util_partialdatetime
Test #34: test_util_datetime
Test #35: test_util_duration
Test #36: test_util_intset_parser
Test #37: test_util_scalarormap
Test #38: test_util_floatcompare
Test #39: test_util_compositepath
Test #40: test_util_stringfunctions
Test #41: test_util_testreference
Test #42: test_util_range
Test #43: test_mpi_mpi
Test #44: test_fft_multiple
Test #45: test_util_algorithms
Test #46: test_util_comparenvectors
Test #47: test_util_missingvalues
Test #48: test_util_associativecontainers
Test #49: test_util_propertiesofnvectors
Test #50: test_util_localenvironment
Test #51: test_util_typetraits
Test #52: test_util_wildcard
Test #53: test_util_configfunctions
Test #54: test_util_confighelpers
Test #55: test_util_timewindow
Test #56: test_util_arrayutil
Test #57: test_base_fieldsets
Test #58: test_util_fieldset_helpers_and_operations
Test #59: test_util_fieldset_subcommunicators
Test #60: test_util_functionspace_helpers
Test #61: test_util_functionspace_helpers_p2
Test #62: test_util_functionspace_helpers_p4
Test #63: test_assimilation_fullgmres
Test #64: test_assimilation_rotmat
Test #65: test_assimilation_solvematrixequation
Test #66: test_assimilation_spectrallmp
Test #67: test_assimilation_testvector3d
Test #68: test_assimilation_tridiagsolve
Test #69: vader_coding_norms
Test #70: saber_coding_norms_src
Test #71: saber_coding_norms_quench
Test #72: ioda_coding_norms
Test #73: test_ioda-collective-functions-h5file
Test #74: test_ioda-collective-functions-h5mem
Test #75: test_ioda-engines_complex_objects_strings-default
Test #76: test_ioda-engines_complex_objects_strings-h5file
Test #77: test_ioda-engines_complex_objects_strings-h5mem
Test #78: test_ioda-engines_complex_objects_strings-ObsStore
Test #79: test_ioda-chunks_and_filters-default
Test #80: test_ioda-chunks_and_filters-h5file
Test #81: test_ioda-chunks_and_filters-h5mem
Test #82: test_ioda-chunks_and_filters-ObsStore
Test #83: test_ioda-engines_data-selections-default
Test #84: test_ioda-engines_data-selections-h5file
Test #85: test_ioda-engines_data-selections-h5mem
Test #86: test_ioda-engines_data-selections-ObsStore
Test #87: test_ioda-engines_dim-selectors-default
Test #88: test_ioda-engines_dim-selectors-h5file
Test #89: test_ioda-engines_dim-selectors-h5mem
Test #90: test_ioda-engines_dim-selectors-ObsStore
Test #91: test_ioda-engines_exception
Test #92: test_ioda-fillvalues-default
Test #93: test_ioda-fillvalues-h5file
Test #94: test_ioda-fillvalues-h5mem
Test #95: test_ioda-fillvalues-ObsStore
Test #96: test_ioda-engines_io_templated_tests-default
Test #97: test_ioda-engines_io_templated_tests-h5file
Test #98: test_ioda-engines_io_templated_tests-h5mem
Test #99: test_ioda-engines_io_templated_tests-ObsStore
Test #100: test_ioda-engines_hier_paths-default
Test #101: test_ioda-engines_hier_paths-h5file
Test #102: test_ioda-engines_hier_paths-h5mem
Test #103: test_ioda-engines_hier_paths-ObsStore
Test #104: test_ioda-engines_layouts_layoutobsgroupodb
Test #105: test_ioda-engines_layouts_layoutobsgroup
Test #106: test_ioda-engines_obsgroup-default
Test #107: test_ioda-engines_obsgroup-h5file
Test #108: test_ioda-engines_obsgroup-h5mem
Test #109: test_ioda-engines_obsgroup-ObsStore
Test #110: test_ioda-engines_obsgroup_append
Test #111: test_ioda-engines_obsgroup_append_function
Test #112: test_ioda-engines_sfuncs_concatstringvectors
Test #113: test_ioda-engines_sfuncs_convertv1pathtov2path
Test #114: test_ioda-engines_persist-default
Test #115: test_ioda-engines_persist-h5file
Test #116: test_ioda-engines_persist-h5mem
Test #117: test_ioda-engines_persist-ObsStore
Test #118: test_ioda-engines_list_objects-default
Test #119: test_ioda-engines_list_objects-h5file
Test #120: test_ioda-engines_list_objects-h5mem
Test #121: test_ioda-engines_list_objects-ObsStore
Test #122: test_ioda-engines_hasvariables_stitchcomplementaryvars
Test #123: test_ioda-engines_hasvariables_convertvariableunits
Test #124: ioda-python
Test #125: ioda-obsspace-python
Test #126: test_ioda-engines_examples_prep_data
Test #127: test_ioda-engines-01-default
Test #128: test_ioda-engines-01-h5file
Test #129: test_ioda-engines-01-h5mem
Test #130: test_ioda-engines-01-obsstore
Test #131: test_ioda-engines-02-default
Test #132: test_ioda-engines-02-h5file
Test #133: test_ioda-engines-02-h5mem
Test #134: test_ioda-engines-02-obsstore
Test #135: test_ioda-engines-03-default
Test #136: test_ioda-engines-03-h5file
Test #137: test_ioda-engines-03-h5mem
Test #138: test_ioda-engines-03-obsstore
Test #139: test_ioda-engines-04-default
Test #140: test_ioda-engines-04-h5file
Test #141: test_ioda-engines-04-h5mem
Test #142: test_ioda-engines-04-obsstore
Test #143: test_ioda-engines-05a-default
Test #144: test_ioda-engines-05a-h5file
Test #145: test_ioda-engines-05a-h5mem
Test #146: test_ioda-engines-05a-obsstore
Test #147: test_ioda-engines-05b-default
Test #148: test_ioda-engines-05b-h5file
Test #149: test_ioda-engines-05b-h5mem
Test #150: test_ioda-engines-05b-obsstore
Test #151: test_ioda-engines-00-Strings-F
Test #152: test_ioda-engines-00-VecStrings-F
Test #153: test_ioda-engines-01-GroupsAndObsSpaces-F
Test #154: test_ioda-engines-02-Attributes-F
Test #155: test_ioda-engines-03-Variables-F
Test #156: test_ioda-engines-01-Py
Test #157: test_ioda-engines-02-Py
Test #158: test_ioda-engines-03-Py
Test #159: test_ioda-engines-04-Py
Test #160: test_ioda-engines-05-Py
Test #161: test_ioda-engines-06-Py
Test #162: test_ioda-engines-07a-Py-ObsSpaceClass
Test #163: test_ioda-engines-07b-Py-ObsSpaceClassDataTypes
Test #164: test_ioda-engines-chrono-Py
Test #165: test_ioda-engines_chrono-default
Test #166: test_ioda-engines_chrono-h5file
Test #167: test_ioda-engines_chrono-h5mem
Test #168: test_ioda-engines_complex_objects_array_from_struct-default
Test #169: test_ioda-engines_complex_objects_array_from_struct-h5file
Test #170: test_ioda-engines_complex_objects_array_from_struct-h5mem
Test #171: test_ioda-engines_complex_objects_array_from_struct-ObsStore
Test #172: test_ioda-engines_fixed_length_strings-default
Test #173: test_ioda-engines_fixed_length_strings-h5file
Test #174: test_ioda-engines_fixed_length_strings-h5mem
Test #175: test_ioda-engines_fixed_length_strings_client-default
Test #176: test_ioda-engines_fixed_length_strings_client-h5file
Test #177: test_ioda-engines_fixed_length_strings_client-h5mem
Test #178: test_ioda-engines_named_types-default
Test #179: test_ioda-engines_named_types-h5file
Test #180: test_ioda-engines_named_types-h5mem
Test #181: test_ioda-engines_units
Test #182: test_ioda-engines_basic_math
Test #183: test_ioda-engines_variables_math
Test #184: ioda_pyiodautils_coding_norms
Test #185: ufo_coding_norms
Test #186: test_ufo_opr_autogenerated
Test #187: test_autogeneratedfilter
Test #188: test_femps_csgrid
Test #189: fv3jedi_test_tier1_coding_norms
Test #190: soca_coding_norms
Test #191: test_gdasapp_util_coding_norms
Test #192: test_gdasapp_util_ioda_example
Test #193: test_gdasapp_util_prepdata
Test #194: test_gdasapp_util_rads2ioda
Test #195: test_gdasapp_util_ghrsst2ioda
Test #196: test_gdasapp_util_rtofstmp
Test #197: test_gdasapp_util_rtofssal
Test #198: test_gdasapp_util_smap2ioda
Test #199: test_gdasapp_util_smos2ioda
Test #200: test_gdasapp_util_viirsaod2ioda
Test #201: test_gdasapp_util_icecamsr2ioda
Test #202: test_gdasapp_util_icecmirs2ioda
Test #203: test_gdasapp_util_icecjpssrr2ioda
Test #204: test_dautils_ioda_example
Test #205: iodaconv_compo_coding_norms
Test #206: iodaconv_gsi_ncdiag_coding_norms
Test #207: iodaconv_goes_coding_norms
Test #208: iodaconv_hdf5_coding_norms
Test #209: iodaconv_land_coding_norms
Test #210: iodaconv_lib-python_coding_norms
Test #211: iodaconv_marine_coding_norms
Test #212: iodaconv_conventional_coding_norms
Test #213: iodaconv_ncep_coding_norms
Test #214: iodaconv_ssec_coding_norms
Test #215: iodaconv_wrfda_ncdiag_coding_norms
Test #216: iodaconv_singleob_coding_norms
Test #217: iodaconv_mrms_coding_norms
Test #218: iodaconv_gnssro_coding_norms
Test #219: iodaconv_bufr_coding_norms
Test #220: iodaconv_satbias_py_coding_norms
Test #221: iodaconv_gsi_varbc_coding_norms
Test #222: test_gdasapp_check_python_norms
Test #223: test_gdasapp_check_yaml_keys
Test #224: test_gdasapp_jedi_increment_to_fv3
Test #225: test_gdasapp_fv3jedi_fv3inc
Test #226: test_gdasapp_snow_create_ens
Test #227: test_gdasapp_snow_imsproc
Test #228: test_gdasapp_snow_apply_jediincr
Test #229: test_gdasapp_snow_letkfoi_snowda
Test #230: test_gdasapp_convert_bufr_adpsfc_snow
Test #231: test_gdasapp_convert_bufr_adpsfc
Test #232: test_gdasapp_convert_gsi_satbias
Test #233: test_bufr2ioda_insitu_profile_argo
Test #234: test_bufr2ioda_insitu_profile_bathy
Test #235: test_bufr2ioda_insitu_profile_glider
Test #236: test_bufr2ioda_insitu_profile_tesac
Test #237: test_bufr2ioda_insitu_profile_xbtctd
Test #238: test_bufr2ioda_insitu_surface_trkob
Total Tests: 238
Just chining in here since I saw this thread, I think we should basically require JCSDA core to accept PRs that make building the ctests
optional. There is no reason why the default can't be the current behavior and we can add some extra CMake logic to skip these thousands of tests. If JEDI is supposed to be flexible, and used by all, then this is something that needs to be added as an option.
Add ENABLE_JEDI_CTESTS
to a few more CMakeLists.txt
. Down to 168 tests
(gdasapp) hercules-login-2:/work/noaa/da/rtreadon/git/GDASApp/install/build$ ctest -N
Test project /work/noaa/da/rtreadon/git/GDASApp/install/build
Test #1: bufr_query_coding_norms
Test #2: oops_coding_norms
Test #3: test_oops_base_dummy_run_one
Test #4: test_oops_base_dummy_run_no_validate
Test #5: test_oops_base_dummy_run_validate_zero
Test #6: test_oops_base_dummy_run_bad_arg_zero
Test #7: test_oops_base_dummy_run_bad_arg_one
Test #8: test_oops_base_dummy_run_bad_arg_two
Test #9: test_oops_base_dummy_run_bad_arg_three
Test #10: test_oops_base_dummy_run_help
Test #11: test_oops_base_dummy_run_h
Test #12: test_oops_base_variables
Test #13: test_oops_base_obsvariables
Test #14: test_base_posttimer
Test #15: test_util_signal_trap_fpe_div_by_zero
Test #16: test_util_signal_trap_fpe_invalid_op
Test #17: test_util_signal_trap_fpe_valid_op
Test #18: test_util_stacktrace
Test #19: test_util_random
Test #20: test_util_pushstringvector
Test #21: test_util_parameters
Test #22: test_generic_atlas_interpolator
Test #23: test_generic_unstructured_interpolator
Test #24: test_generic_atlas_global_interpolator
Test #25: test_generic_unstructured_global_interpolator
Test #26: test_generic_unstructured_global_interpolator_parallel
Test #27: test_generic_gc99
Test #28: test_generic_soar
Test #29: test_coupled_splitvariables
Test #30: test_util_isanypointinvolumeinterior
Test #31: test_util_partialdatetime
Test #32: test_util_datetime
Test #33: test_util_duration
Test #34: test_util_intset_parser
Test #35: test_util_scalarormap
Test #36: test_util_floatcompare
Test #37: test_util_compositepath
Test #38: test_util_stringfunctions
Test #39: test_util_testreference
Test #40: test_util_range
Test #41: test_mpi_mpi
Test #42: test_fft_multiple
Test #43: test_util_algorithms
Test #44: test_util_comparenvectors
Test #45: test_util_missingvalues
Test #46: test_util_associativecontainers
Test #47: test_util_propertiesofnvectors
Test #48: test_util_localenvironment
Test #49: test_util_typetraits
Test #50: test_util_wildcard
Test #51: test_util_configfunctions
Test #52: test_util_confighelpers
Test #53: test_util_timewindow
Test #54: test_util_arrayutil
Test #55: test_base_fieldsets
Test #56: test_util_fieldset_helpers_and_operations
Test #57: test_util_fieldset_subcommunicators
Test #58: test_util_functionspace_helpers
Test #59: test_util_functionspace_helpers_p2
Test #60: test_util_functionspace_helpers_p4
Test #61: test_assimilation_fullgmres
Test #62: test_assimilation_rotmat
Test #63: test_assimilation_solvematrixequation
Test #64: test_assimilation_spectrallmp
Test #65: test_assimilation_testvector3d
Test #66: test_assimilation_tridiagsolve
Test #67: vader_coding_norms
Test #68: saber_coding_norms_src
Test #69: saber_coding_norms_quench
Test #70: ioda_coding_norms
Test #71: test_ioda-engines_examples_prep_data
Test #72: test_ioda-engines-01-default
Test #73: test_ioda-engines-01-h5file
Test #74: test_ioda-engines-01-h5mem
Test #75: test_ioda-engines-01-obsstore
Test #76: test_ioda-engines-02-default
Test #77: test_ioda-engines-02-h5file
Test #78: test_ioda-engines-02-h5mem
Test #79: test_ioda-engines-02-obsstore
Test #80: test_ioda-engines-03-default
Test #81: test_ioda-engines-03-h5file
Test #82: test_ioda-engines-03-h5mem
Test #83: test_ioda-engines-03-obsstore
Test #84: test_ioda-engines-04-default
Test #85: test_ioda-engines-04-h5file
Test #86: test_ioda-engines-04-h5mem
Test #87: test_ioda-engines-04-obsstore
Test #88: test_ioda-engines-05a-default
Test #89: test_ioda-engines-05a-h5file
Test #90: test_ioda-engines-05a-h5mem
Test #91: test_ioda-engines-05a-obsstore
Test #92: test_ioda-engines-05b-default
Test #93: test_ioda-engines-05b-h5file
Test #94: test_ioda-engines-05b-h5mem
Test #95: test_ioda-engines-05b-obsstore
Test #96: test_ioda-engines-00-Strings-F
Test #97: test_ioda-engines-00-VecStrings-F
Test #98: test_ioda-engines-01-GroupsAndObsSpaces-F
Test #99: test_ioda-engines-02-Attributes-F
Test #100: test_ioda-engines-03-Variables-F
Test #101: test_ioda-engines-01-Py
Test #102: test_ioda-engines-02-Py
Test #103: test_ioda-engines-03-Py
Test #104: test_ioda-engines-04-Py
Test #105: test_ioda-engines-05-Py
Test #106: test_ioda-engines-06-Py
Test #107: test_ioda-engines-07a-Py-ObsSpaceClass
Test #108: test_ioda-engines-07b-Py-ObsSpaceClassDataTypes
Test #109: test_ioda-engines-chrono-Py
Test #110: test_ioda-engines_chrono-default
Test #111: test_ioda-engines_chrono-h5file
Test #112: test_ioda-engines_chrono-h5mem
Test #113: test_ioda-engines_complex_objects_array_from_struct-default
Test #114: test_ioda-engines_complex_objects_array_from_struct-h5file
Test #115: test_ioda-engines_complex_objects_array_from_struct-h5mem
Test #116: test_ioda-engines_complex_objects_array_from_struct-ObsStore
Test #117: test_ioda-engines_fixed_length_strings-default
Test #118: test_ioda-engines_fixed_length_strings-h5file
Test #119: test_ioda-engines_fixed_length_strings-h5mem
Test #120: test_ioda-engines_fixed_length_strings_client-default
Test #121: test_ioda-engines_fixed_length_strings_client-h5file
Test #122: test_ioda-engines_fixed_length_strings_client-h5mem
Test #123: test_ioda-engines_named_types-default
Test #124: test_ioda-engines_named_types-h5file
Test #125: test_ioda-engines_named_types-h5mem
Test #126: test_ioda-engines_units
Test #127: test_ioda-engines_basic_math
Test #128: test_ioda-engines_variables_math
Test #129: ioda_pyiodautils_coding_norms
Test #130: ufo_coding_norms
Test #131: test_femps_csgrid
Test #132: fv3jedi_test_tier1_coding_norms
Test #133: soca_coding_norms
Test #134: test_dautils_ioda_example
Test #135: iodaconv_compo_coding_norms
Test #136: iodaconv_gsi_ncdiag_coding_norms
Test #137: iodaconv_goes_coding_norms
Test #138: iodaconv_hdf5_coding_norms
Test #139: iodaconv_land_coding_norms
Test #140: iodaconv_lib-python_coding_norms
Test #141: iodaconv_marine_coding_norms
Test #142: iodaconv_conventional_coding_norms
Test #143: iodaconv_ncep_coding_norms
Test #144: iodaconv_ssec_coding_norms
Test #145: iodaconv_wrfda_ncdiag_coding_norms
Test #146: iodaconv_singleob_coding_norms
Test #147: iodaconv_mrms_coding_norms
Test #148: iodaconv_gnssro_coding_norms
Test #149: iodaconv_bufr_coding_norms
Test #150: iodaconv_satbias_py_coding_norms
Test #151: iodaconv_gsi_varbc_coding_norms
Test #152: test_gdasapp_check_python_norms
Test #153: test_gdasapp_check_yaml_keys
Test #154: test_gdasapp_jedi_increment_to_fv3
Test #155: test_gdasapp_fv3jedi_fv3inc
Test #156: test_gdasapp_snow_create_ens
Test #157: test_gdasapp_snow_imsproc
Test #158: test_gdasapp_snow_apply_jediincr
Test #159: test_gdasapp_snow_letkfoi_snowda
Test #160: test_gdasapp_convert_bufr_adpsfc_snow
Test #161: test_gdasapp_convert_bufr_adpsfc
Test #162: test_gdasapp_convert_gsi_satbias
Test #163: test_bufr2ioda_insitu_profile_argo
Test #164: test_bufr2ioda_insitu_profile_bathy
Test #165: test_bufr2ioda_insitu_profile_glider
Test #166: test_bufr2ioda_insitu_profile_tesac
Test #167: test_bufr2ioda_insitu_profile_xbtctd
Test #168: test_bufr2ioda_insitu_surface_trkob
Total Tests: 168
Trying to figure out the source for the following tests
test_util_
test_ioda-engines_
iodaconv_
test_bufr2ioda_
The usual command for adding a test is ecbuild_add_test
so you could try to grep for that in every CMakeLists.txt across the source code directories.
Tedious process but down to 104 ctests returned by ctest -N
.
What's the make time at this point? Perhaps we can have a few tests being built. If the changes become convoluted tests are likely to creep back in with future code changes anyway.
What's the make time at this point? Perhaps we can have a few tests being built. If the changes become convoluted tests are likely to creep back in with future code changes anyway.
The most recent build (configure & compile) on Hercules with 104 ctests took 36:50 (minutes:seconds). develop
with 1899 ctests took 37:39 to build on Hercules. It seems wrong that the two timings are basically the same.
All the executables for the tests can be built in parallel (any many tests rely on executables built anyway) so it's possible that what you're seeing is correct. Yes there's a lot of tests but possibly dwarfed by the number of source files at this point.
From what I can remember, the tests themselves are usually trivial to build, what can take some time are the executables that are only used for testing (mostly in UFO) but I think by just building gdas.x
we can avoid this.
From what I can remember, the tests themselves are usually trivial to build, what can take some time are the executables that are only used for testing (mostly in UFO) but I think by just building
gdas.x
we can avoid this.
Yes, even though the number of ctests has been drastically reduced, the build/bin/
directory still contains a lot of *.x
and *.py
files. We don't need most of these for g-w cycling. What change(s) are need to only build the executables we need?
@RussTreadon-NOAA perhaps we try this a slightly different way. Inverting and renaming the flag we would have:
option( LIBRARY_ONLY_BUILD "Only build JEDI libraries and skip tests and executables" OFF )
Then switch what you've already done to be instead:
if(NOT LIBRARY_ONLY_BUILD)
add_subdirectory( test )
endif()
Then (for example) the following file: https://github.com/JCSDA-internal/fv3-jedi/blob/develop/src/CMakeLists.txt could be:
add_subdirectory( fv3jedi )
if( NOT LIBRARY_ONLY_BUILD )
add_subdirectory( mains )
ecbuild_add_test( TARGET fv3jedi_test_tier1_coding_norms
TYPE SCRIPT
COMMAND ${CMAKE_BINARY_DIR}/bin/cpplint.py
ARGS --quiet --recursive ${CMAKE_CURRENT_SOURCE_DIR}
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/bin )
endif()
You can grep on 'ecbuild_add_executable
' or just 'add_executable
' (if the user chooses vanilla CMake) in all CMakeLists.txt files so find all the places you would wrap things, many of them will already be wrapped by what you've done. Usually there's a 'mains' directory that just needs to be wrapped in the logic.
Sorry to ask for additional work but this might improve build time and should drain out the bin directory. One caveat is that I would expect JCSDA to be more resistant to this approach since it may have limited use outside our group. We have the special case of building gdas.x whereas everyone else relies on the executables that we would be turning off.
Thank you @danholdaway for the suggestion. I'll make a new clone of feature/install and give this a try.
Complete the following in a clone of feature/install at f49e2e6.
LIBRARY_ONLY_BUILD
logic in CMakeLists.txt
sorc/
submodules
modified: sorc/bufr-query (modified content)
modified: sorc/crtm (modified content)
modified: sorc/fv3-jedi (modified content)
modified: sorc/gsibec (modified content)
modified: sorc/ioda (modified content)
modified: sorc/iodaconv (modified content)
modified: sorc/saber (modified content)
modified: sorc/soca (modified content)
modified: sorc/ufo (modified content)
modified: sorc/vader (modified content)
date
to build.sh
to log time for each section of the build./build.sh
with -DCMAKE_INSTALL_PREFIX
specifiedBuild completed with following timestamps
Thu Oct 3 18:57:06 UTC 2024
Building GDASApp on hercules
...
Configuring ...
Thu Oct 3 18:57:20 UTC 2024
...
Building ...
Thu Oct 3 19:01:16 UTC 2024
...
Installing ...
Thu Oct 3 19:37:03 UTC 2024
...
CMake Error at gdas/test/cmake_install.cmake:69 (file):
file INSTALL cannot find
"/work/noaa/da/rtreadon/git/GDASApp/install_lib/bundle/gdas/test/testinput/amsua_n19_ewok.yaml":
No such file or directory.
The timestamp on the log file is Thu Oct 3 20:51.
Configuring took about 4 minutes. Building took around 36 minutes. Installing ran 74 minutes before hitting an error.
An install
directory was created. It contains the following
(gdasapp) hercules-login-2:/work/noaa/da/rtreadon/git/GDASApp/install_lib/install$ ls
'$(PYIODA_INSTALL_LIBDIR)' bin include lib64 MOM6 test
'$(PYIOODACONV_INSTALL_LIBDIR)' doc lib module share ush
The directories have executables, libraries, module files, etc.
I need to
amsua_n19_ewok.yaml
CMakeLists.txt
files and screen more stuff out using the LIBRARY_ONLY_BUILD flag@RussTreadon-NOAA this might help for the ioda-converter issue: https://github.com/JCSDA-internal/ioda-converters/pull/1549
@RussTreadon-NOAA this might help for the ioda-converter issue: JCSDA-internal/ioda-converters#1549
Thanks @CoryMartin-NOAA
@RussTreadon-NOAA this might help for the ioda-converter issue: JCSDA-internal/ioda-converters#1549
Thanks @CoryMartin-NOAA
Manually added the path changes in JCSDA-internal/ioda-converters#1549 into the working copy of feature/install
. '$(PYIODA_INSTALL_LIBDIR)'
and '$(PYIOODACONV_INSTALL_LIBDIR)'
are no longer present in the install directory.
hercules-login-3:/work/noaa/da/rtreadon/git/GDASApp/install_lib/install$ ls
MOM6 bin doc include lib lib64 module share test ush
Install shouldn't take 74 minutes as it's usually just copying all the files from build to install path. It makes it sound like more code is being built at that time. Can build and install be one step?
cd build
ecbuild ../
make -j6 install
@danholdaway , I didn't know we can specify parallel streams on the install. build.sh
just has make install
.
Let me add -j 6
.
The key also is to not be issuing make more than once. If doing install it should only be done once, from the top level.
Timings with make -j 6
for install are improved
Sat Oct 5 11:55:56 UTC 2024
Building GDASApp on hercules
...
Configuring ...
Sat Oct 5 11:56:21 UTC 2024
...
Building ...
Sat Oct 5 11:58:38 UTC 2024
...
Installing ...
Sat Oct 5 12:31:31 UTC 2024
...
Sat Oct 5 12:44:27 UTC 2024
This translates to the approximate timings below
It may be possible to reduce the build time by being more aggressive with the LIBRARY_ONLY_BUILD
flag.
The above work is being done in /work/noaa/da/rtreadon/git/GDASApp/install_lib/
In your build.sh I don't think you need to be running the code in this block:
# Build
echo "Building ... `date`"
set -x
if [[ $BUILD_JCSDA == 'YES' ]]; then
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
else
builddirs="gdas iodaconv land-imsproc land-jediincr gdas-utils bufr-query da-utils"
for b in $builddirs; do
cd $b
make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
cd ../
done
fi
set +x
This will make all the packages iodaconv land-imsproc land-jediincr gdas-utils bufr-query da-utils
sequentially. Whereas the make install -j ${BUILD_JOBS:-6}
done below will build everything. Doing them sequentially can be helpful if you don't intend to install the entire JEDI package. But since you do because you want to run install it doesn't actually help and could hinder. If you just run make install -j ${BUILD_JOBS:-6}
then it lets CMake make maximum use of all the processors available to it. If you build just gdas and iodaconv sequentially then you can't start making the ioda executables until you're all done with gdas and gdas has one giant executable that takes a couple of minutes to build. Let the other 5 pros be working on something else while you do that.
Refactor as @danholdaway suggested. Rebuild and reinstall feature/install
on Hercules with following timing
hercules-login-3:/work/noaa/da/rtreadon/git/GDASApp/install_lib$ grep "Mon Oct" build_install.log
Begin ... Mon Oct 7 15:53:31 UTC 2024
Configuring ... Mon Oct 7 15:53:41 UTC 2024
Building ... Mon Oct 7 15:56:11 UTC 2024
Installing ... Mon Oct 7 16:36:36 UTC 2024
Complete .. Mon Oct 7 16:37:53 UTC 2024
The configure took 3 minutes, The build took 40 minutes. The install took 1 minute, 17 seconds.
ctest -N
returns 127 tests. install/bin
contains a total of 175 files: 78 *.x
, 87 *.py
, 11 miscellaneous files. I think we should see fewer files in install/bin
.
That sounds reasonable Russ. If you have time it would be good to know if the library only build makes a difference in this mode of running. And perhaps whether increasing the number of cores makes much difference.
Below are timings for develop
at 9d95c9d and feature/install
at 9d95c9d with modifications to CMakeLists.txt
as noted above.
Notes:
minutes:seconds
total
includes start up time before configure. Thus, the total
exceeds the some of configure
, build
, and, optionally, install
develop
at 9d95c9d . Thus, no install times for develop
make -j 6 |
develop |
feature/install |
|
---|---|---|---|
configure | 03:29 | 02:23 | |
build | 35:37 | 39:59 | |
install | 00:24 | ||
total | 39: 21 | 43:10 |
make -j 8 |
develop |
feature/install |
|
---|---|---|---|
configure | 06:37 | 02:14 | |
build | 29:00 | 31:16 | |
install | 00:27 | ||
total | 35:41 | 33:59 |
make -j 12 |
develop |
feature/install |
|
---|---|---|---|
configure | 07:08 | 01:59 | |
build | 26:31 | 25:46 | |
install | 00:48 | ||
total | 33:50 | 28:38 |
make -j 16 |
develop |
feature/install |
|
---|---|---|---|
configure | 04:38 | 02:40 | |
build | 24:59 | 27:35 | |
install | 00:32 | ||
total | 29:50 | 30:56 |
make -j 20 |
develop |
feature/install |
|
---|---|---|---|
configure | 03:20 | 02:07 | |
build | 24:18 | 28:09 | |
install | 01:47 | ||
total | 27:50 | 32:07 |
Attempts using make -j 24
failed with
icpc: error #10106: Fatal error in /apps/spack-managed/gcc-11.3.1/intel-oneapi-compilers-2023.1.0-sb753366rvywq75zeg4ml5k5c72xgj72/comp\
iler/2023.1.0/linux/bin/intel64/../../bin/intel64/mcpcom, terminated by kill signal
compilation aborted for /work/noaa/da/rtreadon/git/GDASApp/install_lib/bundle/fv3-jedi/src/mains/fv3jediControlPert.cc (code 1)
make[2]: *** [fv3-jedi/src/mains/CMakeFiles/fv3jedi_controlpert.x.dir/build.make:76: fv3-jedi/src/mains/CMakeFiles/fv3jedi_controlpert.\
x.dir/fv3jediControlPert.cc.o] Error 1
make[2]: Leaving directory '/work/noaa/da/rtreadon/git/GDASApp/install_lib/build'
make[1]: *** [CMakeFiles/Makefile2:8129: fv3-jedi/src/mains/CMakeFiles/fv3jedi_controlpert.x.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
why are the build times always longer in the feature branch than develop? Is it because it is building everything vs just some things?
@CoryMartin-NOAA , I think you are right. The flip side of the faster selective build with develop
is that the install time is much longer.
I replaced three references to rossrad.dat
in sorc/soca
with rossrad.nc
. After this, install works with develop
. The install takes a long time because jedi component ctests remain active in develop
. install builds the scripts and executables needed to run these tests. ctest -N
in the develop build
directory returns 2003 tests with WORKLOW_BUILD=OFF
. The feature branch turns off most tests. Only 127 remain with WORKFLOW_BUILD=OFF`.
We're caught between
develop
feature/install
2 comes at the cost of modifying CMakeLists.txt
for most jedi components in sorc/
. I can see that even if we can commit these CMakeLists.txt
changes to various jedi repos, we will need to keep an eye on future jedi hashes to ensure new or existing tests don't wind up outside the _notest flag regions.
I could test option 3 - fast selective build using feature/install
along with faster install.
My hunch is that the slow install is because of the selective build, can you do a ./build.sh -a
on develop? does that build everything (it's supposed to)
Made the following local modifications in develop
at 9d95c9d on Hercules to enable successful install
test/CMakeLists.txt
- remove test yamls that are no longer present in GDASAppsorc/soca
- replace rossrad.dat
with rossrad.nc
Execute ./build.sh -f -a -p /work/noaa/da/rtreadon/git/GDASApp/test_install/install
. Run make with 20 cores. Configure, build, and install successfully ran to completion. Timings are below
make -j 20 |
develop |
feature/install |
develop with install |
|
---|---|---|---|---|
configure | 03:20 | 02:07 | 12:34 | |
build | 24:18 | 28:09 | 33:17 | |
install | 01:47 | 02:29 | ||
total | 27:50 | 32:07 | 48:27 |
12:34 to run configure? wow!
12:34 to run configure? wow!
Strange. Is that consistent or was the machine just struggling at that moment? Does the feature branch have a difference in the configure?
Thanks so much for going through the pains of testing and comparing all these ways of building and installing @RussTreadon-NOAA, tremendously helpful to see all this. What seems to pop out to me is that unfortunately a library only/no tests build doesn't really save all that much time in installing JEDI. JEDI has just become a behemoth of source code that takes an age to compile, and doesn't scale particularly well with processors. Note that the shared drives of HPCs may also not be the best place to see the fastest make times and may explain why the time even started to increase with more processors.
And even with quite a bit of work it wasn't possible to turn off all the tests or prevent the bin directory from filling up with things. So ultimately this may not even really satisfy NCO's wishes to have empty or at least clean directories. It seems we would need (possibly a lot) more work to fully eliminate all tests and bin directory copies. It also may be never ending because little would prevent folks from putting tests/executables outside of the fences that we'd create in the CMake files. What do you all think, is this a fair assessment of what we're seeing?
The 12:34
looks to be anomalous. I reran and configure took 04:21. I ran again and configure took 03:24. I'm working on Hercules login nodes. Variations in the login node load would impact configure and build timings, right?
@danholdaway , I agree with your assessment.
Your last point is a major concern. Even if we complete the task of trimming down the configure, compile, and install to satisfy EE2 requirements, maintenance of this set up requires constant vigilance. It's not hard to imaging developers adding new tests or executables outside the blocked sections we added for EE2 compliance.
The module approach get us closer to EE2 compliance. It does so, however, at the cost of not being development friendly. Assuming JEDI modules are released via spack-stack, developers are limited to what's in the stack unless they install their own jedi modules and adjust the GDASApp build accordingly.
Even though configure, compile, and install take more than 30 minutes, does the final install
directory move us closer to EE2 compliance? The install option populates the user specified INSTALL_PREFIX
with directories
hercules-login-4:/work/noaa/da/rtreadon/git/GDASApp/test_install/install$ ls
MOM6 bin doc include lib lib64 module share test ush
bin/
has 217 files. Most of these files we don't need for GFS v17 or v18. There may be other files in other install/
directories that we don't need for operations. Can we add scripting to build.sh
to remove unnecessary files?
We do this with the operational GSI build. gsi.fd
contains extra stuff which operations does not need. Script ush/build_4nco_global.sh
not only builds operational executables. It also removes extraneous directories, moves the install directory to the desired operational location, and removes the build directory.
Should we develop a similar build_ops.sh
script for GDASApp?
Having the install is definitely needed as that eliminates the need to link the executables. We can just install to the GFSHOME directory and point to the executable there. I'm fine with adding to the script to keep only bin/gdas* files.
Thank you @danholdaway for your comment. Enabling install requires minor updates to GDASApp, ioda-converters, and soca. I'll open issues to work on these updates.
Thanks @RussTreadon-NOAA
EE2 does not require executable to be copied to the job run directory
$DATA
. Executables can be referenced from$HOMEmodel/exec
.This issue is opened to use the cmake install option to copy GDASApp executables, modules, and libraries in directories more aligned with the EE2 requirement.