NOAA-EMC / GDASApp

Global Data Assimilation System Application
GNU Lesser General Public License v2.1
15 stars 31 forks source link

Use install option in GDASApp build #1302

Open RussTreadon-NOAA opened 1 month ago

RussTreadon-NOAA commented 1 month ago

EE2 does not require executable to be copied to the job run directory $DATA. Executables can be referenced from $HOMEmodel/exec.

This issue is opened to use the cmake install option to copy GDASApp executables, modules, and libraries in directories more aligned with the EE2 requirement.

RussTreadon-NOAA commented 1 month ago

This issue is an EE2 compliance issue being tracked on GDASApp issue #1254

RussTreadon-NOAA commented 1 month ago

As a test make the following modification to sorc/build_gdas.sh in a working copy of g-w branch feature/radbcor (see g-w PR #2875)

@@ -24,6 +24,6 @@ shift $((OPTIND-1))
 # shellcheck disable=SC2086
 BUILD_JOBS="${BUILD_JOBS:-8}" \
 WORKFLOW_BUILD="ON" \
-./gdas.cd/build.sh ${_opts} -f
+./gdas.cd/build.sh ${_opts} -f -p /work/noaa/da/rtreadon/git/global-workflow/radbcor/bin

 exit

Currently executing build_gdas.sh on Hercules. Install is proceeding very slowly. Is this expected @danholdaway ? Did you simply add -p $INSTALL_PATH to build_gdas.sh in your test?

danholdaway commented 1 month ago

@RussTreadon-NOAA in my test I modified the build script as I believe the logic is flawed when you choose install. Here is the logic:

# Build
echo "Building ..."
set -x
if [[ $BUILD_JCSDA == 'YES' ]]; then
  make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
else
  builddirs="gdas iodaconv land-imsproc land-jediincr gdas-utils bufr-query"
  for b in $builddirs; do
    cd $b
    make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
    cd ../
  done
fi

# Install
if [[ -n ${INSTALL_PREFIX:-} ]]; then
  echo "Installing ..."
  set -x
  make install
  set +x
fi

Note that the script loops over the builddirs and makes all the code associated with these packages. But once it's done with that there is still a very large amount that remains unbuilt, e.g. ufo tests, saber tests, oops toy model etc. That final make install will make all that remaining code but it will only do it with one processor. If you want to do install you have to make absolutely everything, there is no two ways about it. Otherwise you won't copy the libraries to the install path.

For quicker build with install it needs to be something like:

# Build
echo "Building ..."
set -x
if [[ -z ${INSTALL_PREFIX:-} ]]; then
  if [[ $BUILD_JCSDA == 'YES' ]]; then
    make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
  else
    builddirs="gdas iodaconv land-imsproc land-jediincr gdas-utils bufr-query"
    for b in $builddirs; do
      cd $b
      make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
      cd ../
    done
  fi
fi
set +x

# Install
if [[ -n ${INSTALL_PREFIX:-} ]]; then
  echo "Installing ..."
  set -x
  make -j ${BUILD_JOBS:-6} install
  set +x
fi

I.e. skip the builddirs since they will be built anyway and let CMake figure out an optimal parallel strategy for building them. However note that it will always be a lot slower than the normal way of building because 1000s of tests will be built.

danholdaway commented 1 month ago

Relooking at it you could combine BUILD_JCSDA with INSTALL_PREFIX. Perhaps that was the original intention?

RussTreadon-NOAA commented 1 month ago

Thank you @danholdaway for explaining why install is taking so much time. I'm executing faulty logic. Let me try your suggestions. Another item on our TODO list is turning off all the JEDI tests (GDASApp issue #1269

RussTreadon-NOAA commented 1 month ago

Tested the suggested change to build.sh. Build with INSTALL_PREFIX set is progressing very slowly (now approaching 3 hours). Will revisit next week.

danholdaway commented 1 month ago

In that case it might be a prerequisite that the building of the JEDI tests can be turned off. We could proceed with this relatively easily but it's a feature that has met resistance in the past. I think some of that resistance comes from a difference in working styles. Those opposed argue that the cost of building JEDI is in the noise of the time it takes to run an experiment so why introduce the extra flag and complicating of the build infrastructure, which is already complicated.

One idea I was playing around with before I went on leave is whether we could try to eliminate the need to build JEDI for most people. Then the issue might go away. I think it would have to be done in tandem with implementing more of a process for updating the JEDI hashes. Perhaps we could brainstorm this week?

RussTreadon-NOAA commented 1 month ago

The Hercules build with INSTALL_PREFIX set eventually failed with

-- Installing: /work/noaa/da/rtreadon/git/global-workflow/radbcor/bin/share/soca/testdata/72x35x25/INPUT/ocean_hgrid.nc
-- Installing: /work/noaa/da/rtreadon/git/global-workflow/radbcor/bin/share/soca/testdata/72x35x25/INPUT/ocean_topog.nc
CMake Error at soca/test/cmake_install.cmake:65 (file):
  file INSTALL cannot find
  "/work/noaa/da/rtreadon/git/global-workflow/radbcor/sorc/gdas.cd/bundle/soca/test/Data/rossrad.dat":
  No such file or directory.
Call Stack (most recent call first):
  soca/cmake_install.cmake:84 (include)
  cmake_install.cmake:122 (include)

make: *** [Makefile:130: install] Error 1

File rossrad.nc replaced rossrad.dat. However, rossrad.dat is still referenced in three soca files.

First, gdas.cd/sorc/soca/test/CMakeLists.txt has

set( soca_install_data
  Data/rossrad.dat
  Data/godas_sst_bgerr.nc )
install(FILES ${soca_install_data}
        DESTINATION ${INSTALL_DATA_DIR}/testdata/ )

The two other files that reference rossrad.dat are

gdas.cd/sorc/soca/tutorial/tutorial_tools.sh:    ln -sf $datadir/Data/rossrad.dat .
gdas.cd/sorc/soca/.gitattributes:test/Data/rossrad.dat filter=lfs diff=lfs merge=lfs -text
RussTreadon-NOAA commented 1 month ago

If the install option for the GDASApp build satisfies the EE2 executable requirement, then we should try to find a way to speed up the build with install.

We don't need JEDI ctests when building and installing GDASApp for use in operations. If turning off JEDI ctests speeds up the build and install for operations, we should figure out a way to make this happen.

RussTreadon-NOAA commented 1 month ago

@danholdaway , I agree with your brainstorming idea. The core GDASApp infrastructure team needs to develop a plan to work through the various items from the bi-weekly JEDI workflow sync meeting with EIB.

RussTreadon-NOAA commented 1 month ago

Work for this issue will be done in feature/install

RussTreadon-NOAA commented 1 month ago

@danholdaway recommended making the following change in CMakeLists.txt for the various JEDI submodules in sorc/

option( ENABLE_JEDI_CTESTS "Build JEDI ctests" ON )
if( ENABLE_JEDI_CTESTS )
  add_subdirectory( test )
endif()

The default behavior is for JEDI ctests to be active (ON). Users can add -DENABLE_JEDI_CTESTS=OFF to the cmake configure to turn off JEDI ctests.

feature/install was cloned on Hercules in /work/noaa/da/rtreadon/git/GDASApp/install/. The above scripting was added to CMakeLists.txt in the following JEDI submodules

        modified:   sorc/bufr-query (modified content)
        modified:   sorc/crtm (modified content)
        modified:   sorc/fv3-jedi (modified content)
        modified:   sorc/gsibec (modified content)
        modified:   sorc/ioda (modified content)
        modified:   sorc/iodaconv (modified content)
        modified:   sorc/saber (modified content)
        modified:   sorc/soca (modified content)
        modified:   sorc/ufo (modified content)
        modified:   sorc/vader (modified content)

build.sh was modified as follows

@@ -24,6 +24,7 @@ usage() {
   echo "  -f  force a clean build             DEFAULT: NO"
   echo "  -d  include JCSDA ctest data        DEFAULT: NO"
   echo "  -a  build everything in bundle      DEFAULT: NO"
+  echo "  -j  build with JEDI ctests          DEFAULT: OFF"
   echo "  -h  display this message and quit"
   echo
   exit 1
@@ -39,6 +40,7 @@ BUILD_VERBOSE="NO"
 CLONE_JCSDADATA="NO"
 CLEAN_BUILD="NO"
 BUILD_JCSDA="NO"
+ENABLE_JEDI_CTESTS="OFF"
 COMPILER="${COMPILER:-intel}"

 while getopts "p:t:c:hvdfa" opt; do
@@ -64,6 +66,9 @@ while getopts "p:t:c:hvdfa" opt; do
     a)
       BUILD_JCSDA=YES
       ;;
+    j)
+      ENABLE_JEDI_CTESTS=ON
+      ;;
     h|\?|:)
       usage
       ;;
@@ -98,6 +103,10 @@ mkdir -p ${BUILD_DIR} && cd ${BUILD_DIR}
 # If INSTALL_PREFIX is not empty; install at INSTALL_PREFIX
 [[ -n "${INSTALL_PREFIX:-}" ]] && CMAKE_OPTS+=" -DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}"

+# Activate JEDI ctests if requested
+ENABLE_JEDI_CTESTS=${ENABLE_JEDI_CTESTS:-"OFF"}
+CMAKE_OPTS+=" -DENABLE_JEDI_CTESTS=${ENABLE_JEDI_CTESTS}"
+
 # activate tests based on if this is cloned within the global-workflow
 WORKFLOW_BUILD=${WORKFLOW_BUILD:-"OFF"}
 CMAKE_OPTS+=" -DWORKFLOW_TESTS=${WORKFLOW_BUILD}"

./build.sh was then executed. ctest -N was executed in build/ upon completion. 238 tests remain. Prior to this change there were 1952 ctests. Below is the list of remaining ctests

Test project /work/noaa/da/rtreadon/git/GDASApp/install/build
  Test   #1: gsw_poly_check
  Test   #2: gsw_check_functions
  Test   #3: bufr_query_coding_norms
  Test   #4: oops_coding_norms
  Test   #5: test_oops_base_dummy_run_one
  Test   #6: test_oops_base_dummy_run_no_validate
  Test   #7: test_oops_base_dummy_run_validate_zero
  Test   #8: test_oops_base_dummy_run_bad_arg_zero
  Test   #9: test_oops_base_dummy_run_bad_arg_one
  Test  #10: test_oops_base_dummy_run_bad_arg_two
  Test  #11: test_oops_base_dummy_run_bad_arg_three
  Test  #12: test_oops_base_dummy_run_help
  Test  #13: test_oops_base_dummy_run_h
  Test  #14: test_oops_base_variables
  Test  #15: test_oops_base_obsvariables
  Test  #16: test_base_posttimer
  Test  #17: test_util_signal_trap_fpe_div_by_zero
  Test  #18: test_util_signal_trap_fpe_invalid_op
  Test  #19: test_util_signal_trap_fpe_valid_op
  Test  #20: test_util_stacktrace
  Test  #21: test_util_random
  Test  #22: test_util_pushstringvector
  Test  #23: test_util_parameters
  Test  #24: test_generic_atlas_interpolator
  Test  #25: test_generic_unstructured_interpolator
  Test  #26: test_generic_atlas_global_interpolator
  Test  #27: test_generic_unstructured_global_interpolator
  Test  #28: test_generic_unstructured_global_interpolator_parallel
  Test  #29: test_generic_gc99
  Test  #30: test_generic_soar
  Test  #31: test_coupled_splitvariables
  Test  #32: test_util_isanypointinvolumeinterior
  Test  #33: test_util_partialdatetime
  Test  #34: test_util_datetime
  Test  #35: test_util_duration
  Test  #36: test_util_intset_parser
  Test  #37: test_util_scalarormap
  Test  #38: test_util_floatcompare
  Test  #39: test_util_compositepath
  Test  #40: test_util_stringfunctions
  Test  #41: test_util_testreference
  Test  #42: test_util_range
  Test  #43: test_mpi_mpi
  Test  #44: test_fft_multiple
  Test  #45: test_util_algorithms
  Test  #46: test_util_comparenvectors
  Test  #47: test_util_missingvalues
  Test  #48: test_util_associativecontainers
  Test  #49: test_util_propertiesofnvectors
  Test  #50: test_util_localenvironment
  Test  #51: test_util_typetraits
  Test  #52: test_util_wildcard
  Test  #53: test_util_configfunctions
  Test  #54: test_util_confighelpers
  Test  #55: test_util_timewindow
  Test  #56: test_util_arrayutil
  Test  #57: test_base_fieldsets
  Test  #58: test_util_fieldset_helpers_and_operations
  Test  #59: test_util_fieldset_subcommunicators
  Test  #60: test_util_functionspace_helpers
  Test  #61: test_util_functionspace_helpers_p2
  Test  #62: test_util_functionspace_helpers_p4
  Test  #63: test_assimilation_fullgmres
  Test  #64: test_assimilation_rotmat
  Test  #65: test_assimilation_solvematrixequation
  Test  #66: test_assimilation_spectrallmp
  Test  #67: test_assimilation_testvector3d
  Test  #68: test_assimilation_tridiagsolve
  Test  #69: vader_coding_norms
  Test  #70: saber_coding_norms_src
  Test  #71: saber_coding_norms_quench
  Test  #72: ioda_coding_norms
  Test  #73: test_ioda-collective-functions-h5file
  Test  #74: test_ioda-collective-functions-h5mem
  Test  #75: test_ioda-engines_complex_objects_strings-default
  Test  #76: test_ioda-engines_complex_objects_strings-h5file
  Test  #77: test_ioda-engines_complex_objects_strings-h5mem
  Test  #78: test_ioda-engines_complex_objects_strings-ObsStore
  Test  #79: test_ioda-chunks_and_filters-default
  Test  #80: test_ioda-chunks_and_filters-h5file
  Test  #81: test_ioda-chunks_and_filters-h5mem
  Test  #82: test_ioda-chunks_and_filters-ObsStore
  Test  #83: test_ioda-engines_data-selections-default
  Test  #84: test_ioda-engines_data-selections-h5file
  Test  #85: test_ioda-engines_data-selections-h5mem
  Test  #86: test_ioda-engines_data-selections-ObsStore
  Test  #87: test_ioda-engines_dim-selectors-default
  Test  #88: test_ioda-engines_dim-selectors-h5file
  Test  #89: test_ioda-engines_dim-selectors-h5mem
  Test  #90: test_ioda-engines_dim-selectors-ObsStore
  Test  #91: test_ioda-engines_exception
  Test  #92: test_ioda-fillvalues-default
  Test  #93: test_ioda-fillvalues-h5file
  Test  #94: test_ioda-fillvalues-h5mem
  Test  #95: test_ioda-fillvalues-ObsStore
  Test  #96: test_ioda-engines_io_templated_tests-default
  Test  #97: test_ioda-engines_io_templated_tests-h5file
  Test  #98: test_ioda-engines_io_templated_tests-h5mem
  Test  #99: test_ioda-engines_io_templated_tests-ObsStore
  Test #100: test_ioda-engines_hier_paths-default
  Test #101: test_ioda-engines_hier_paths-h5file
  Test #102: test_ioda-engines_hier_paths-h5mem
  Test #103: test_ioda-engines_hier_paths-ObsStore
  Test #104: test_ioda-engines_layouts_layoutobsgroupodb
  Test #105: test_ioda-engines_layouts_layoutobsgroup
  Test #106: test_ioda-engines_obsgroup-default
  Test #107: test_ioda-engines_obsgroup-h5file
  Test #108: test_ioda-engines_obsgroup-h5mem
  Test #109: test_ioda-engines_obsgroup-ObsStore
  Test #110: test_ioda-engines_obsgroup_append
  Test #111: test_ioda-engines_obsgroup_append_function
  Test #112: test_ioda-engines_sfuncs_concatstringvectors
  Test #113: test_ioda-engines_sfuncs_convertv1pathtov2path
  Test #114: test_ioda-engines_persist-default
  Test #115: test_ioda-engines_persist-h5file
  Test #116: test_ioda-engines_persist-h5mem
  Test #117: test_ioda-engines_persist-ObsStore
  Test #118: test_ioda-engines_list_objects-default
  Test #119: test_ioda-engines_list_objects-h5file
  Test #120: test_ioda-engines_list_objects-h5mem
  Test #121: test_ioda-engines_list_objects-ObsStore
  Test #122: test_ioda-engines_hasvariables_stitchcomplementaryvars
  Test #123: test_ioda-engines_hasvariables_convertvariableunits
  Test #124: ioda-python
  Test #125: ioda-obsspace-python
  Test #126: test_ioda-engines_examples_prep_data
  Test #127: test_ioda-engines-01-default
  Test #128: test_ioda-engines-01-h5file
  Test #129: test_ioda-engines-01-h5mem
  Test #130: test_ioda-engines-01-obsstore
  Test #131: test_ioda-engines-02-default
  Test #132: test_ioda-engines-02-h5file
  Test #133: test_ioda-engines-02-h5mem
  Test #134: test_ioda-engines-02-obsstore
  Test #135: test_ioda-engines-03-default
  Test #136: test_ioda-engines-03-h5file
  Test #137: test_ioda-engines-03-h5mem
  Test #138: test_ioda-engines-03-obsstore
  Test #139: test_ioda-engines-04-default
  Test #140: test_ioda-engines-04-h5file
  Test #141: test_ioda-engines-04-h5mem
  Test #142: test_ioda-engines-04-obsstore
  Test #143: test_ioda-engines-05a-default
  Test #144: test_ioda-engines-05a-h5file
  Test #145: test_ioda-engines-05a-h5mem
  Test #146: test_ioda-engines-05a-obsstore
  Test #147: test_ioda-engines-05b-default
  Test #148: test_ioda-engines-05b-h5file
  Test #149: test_ioda-engines-05b-h5mem
  Test #150: test_ioda-engines-05b-obsstore
  Test #151: test_ioda-engines-00-Strings-F
  Test #152: test_ioda-engines-00-VecStrings-F
  Test #153: test_ioda-engines-01-GroupsAndObsSpaces-F
  Test #154: test_ioda-engines-02-Attributes-F
  Test #155: test_ioda-engines-03-Variables-F
  Test #156: test_ioda-engines-01-Py
  Test #157: test_ioda-engines-02-Py
  Test #158: test_ioda-engines-03-Py
  Test #159: test_ioda-engines-04-Py
  Test #160: test_ioda-engines-05-Py
  Test #161: test_ioda-engines-06-Py
  Test #162: test_ioda-engines-07a-Py-ObsSpaceClass
  Test #163: test_ioda-engines-07b-Py-ObsSpaceClassDataTypes
  Test #164: test_ioda-engines-chrono-Py
  Test #165: test_ioda-engines_chrono-default
  Test #166: test_ioda-engines_chrono-h5file
  Test #167: test_ioda-engines_chrono-h5mem
  Test #168: test_ioda-engines_complex_objects_array_from_struct-default
  Test #169: test_ioda-engines_complex_objects_array_from_struct-h5file
  Test #170: test_ioda-engines_complex_objects_array_from_struct-h5mem
  Test #171: test_ioda-engines_complex_objects_array_from_struct-ObsStore
  Test #172: test_ioda-engines_fixed_length_strings-default
  Test #173: test_ioda-engines_fixed_length_strings-h5file
  Test #174: test_ioda-engines_fixed_length_strings-h5mem
  Test #175: test_ioda-engines_fixed_length_strings_client-default
  Test #176: test_ioda-engines_fixed_length_strings_client-h5file
  Test #177: test_ioda-engines_fixed_length_strings_client-h5mem
  Test #178: test_ioda-engines_named_types-default
  Test #179: test_ioda-engines_named_types-h5file
  Test #180: test_ioda-engines_named_types-h5mem
  Test #181: test_ioda-engines_units
  Test #182: test_ioda-engines_basic_math
  Test #183: test_ioda-engines_variables_math
  Test #184: ioda_pyiodautils_coding_norms
  Test #185: ufo_coding_norms
  Test #186: test_ufo_opr_autogenerated
  Test #187: test_autogeneratedfilter
  Test #188: test_femps_csgrid
  Test #189: fv3jedi_test_tier1_coding_norms
  Test #190: soca_coding_norms
  Test #191: test_gdasapp_util_coding_norms
  Test #192: test_gdasapp_util_ioda_example
  Test #193: test_gdasapp_util_prepdata
  Test #194: test_gdasapp_util_rads2ioda
  Test #195: test_gdasapp_util_ghrsst2ioda
  Test #196: test_gdasapp_util_rtofstmp
  Test #197: test_gdasapp_util_rtofssal
  Test #198: test_gdasapp_util_smap2ioda
  Test #199: test_gdasapp_util_smos2ioda
  Test #200: test_gdasapp_util_viirsaod2ioda
  Test #201: test_gdasapp_util_icecamsr2ioda
  Test #202: test_gdasapp_util_icecmirs2ioda
  Test #203: test_gdasapp_util_icecjpssrr2ioda
  Test #204: test_dautils_ioda_example
  Test #205: iodaconv_compo_coding_norms
  Test #206: iodaconv_gsi_ncdiag_coding_norms
  Test #207: iodaconv_goes_coding_norms
  Test #208: iodaconv_hdf5_coding_norms
  Test #209: iodaconv_land_coding_norms
  Test #210: iodaconv_lib-python_coding_norms
  Test #211: iodaconv_marine_coding_norms
  Test #212: iodaconv_conventional_coding_norms
  Test #213: iodaconv_ncep_coding_norms
  Test #214: iodaconv_ssec_coding_norms
  Test #215: iodaconv_wrfda_ncdiag_coding_norms
  Test #216: iodaconv_singleob_coding_norms
  Test #217: iodaconv_mrms_coding_norms
  Test #218: iodaconv_gnssro_coding_norms
  Test #219: iodaconv_bufr_coding_norms
  Test #220: iodaconv_satbias_py_coding_norms
  Test #221: iodaconv_gsi_varbc_coding_norms
  Test #222: test_gdasapp_check_python_norms
  Test #223: test_gdasapp_check_yaml_keys
  Test #224: test_gdasapp_jedi_increment_to_fv3
  Test #225: test_gdasapp_fv3jedi_fv3inc
  Test #226: test_gdasapp_snow_create_ens
  Test #227: test_gdasapp_snow_imsproc
  Test #228: test_gdasapp_snow_apply_jediincr
  Test #229: test_gdasapp_snow_letkfoi_snowda
  Test #230: test_gdasapp_convert_bufr_adpsfc_snow
  Test #231: test_gdasapp_convert_bufr_adpsfc
  Test #232: test_gdasapp_convert_gsi_satbias
  Test #233: test_bufr2ioda_insitu_profile_argo
  Test #234: test_bufr2ioda_insitu_profile_bathy
  Test #235: test_bufr2ioda_insitu_profile_glider
  Test #236: test_bufr2ioda_insitu_profile_tesac
  Test #237: test_bufr2ioda_insitu_profile_xbtctd
  Test #238: test_bufr2ioda_insitu_surface_trkob

Total Tests: 238
CoryMartin-NOAA commented 1 month ago

Just chining in here since I saw this thread, I think we should basically require JCSDA core to accept PRs that make building the ctests optional. There is no reason why the default can't be the current behavior and we can add some extra CMake logic to skip these thousands of tests. If JEDI is supposed to be flexible, and used by all, then this is something that needs to be added as an option.

RussTreadon-NOAA commented 1 month ago

Add ENABLE_JEDI_CTESTS to a few more CMakeLists.txt. Down to 168 tests

(gdasapp) hercules-login-2:/work/noaa/da/rtreadon/git/GDASApp/install/build$ ctest -N
Test project /work/noaa/da/rtreadon/git/GDASApp/install/build
  Test   #1: bufr_query_coding_norms
  Test   #2: oops_coding_norms
  Test   #3: test_oops_base_dummy_run_one
  Test   #4: test_oops_base_dummy_run_no_validate
  Test   #5: test_oops_base_dummy_run_validate_zero
  Test   #6: test_oops_base_dummy_run_bad_arg_zero
  Test   #7: test_oops_base_dummy_run_bad_arg_one
  Test   #8: test_oops_base_dummy_run_bad_arg_two
  Test   #9: test_oops_base_dummy_run_bad_arg_three
  Test  #10: test_oops_base_dummy_run_help
  Test  #11: test_oops_base_dummy_run_h
  Test  #12: test_oops_base_variables
  Test  #13: test_oops_base_obsvariables
  Test  #14: test_base_posttimer
  Test  #15: test_util_signal_trap_fpe_div_by_zero
  Test  #16: test_util_signal_trap_fpe_invalid_op
  Test  #17: test_util_signal_trap_fpe_valid_op
  Test  #18: test_util_stacktrace
  Test  #19: test_util_random
  Test  #20: test_util_pushstringvector
  Test  #21: test_util_parameters
  Test  #22: test_generic_atlas_interpolator
  Test  #23: test_generic_unstructured_interpolator
  Test  #24: test_generic_atlas_global_interpolator
  Test  #25: test_generic_unstructured_global_interpolator
  Test  #26: test_generic_unstructured_global_interpolator_parallel
  Test  #27: test_generic_gc99
  Test  #28: test_generic_soar
  Test  #29: test_coupled_splitvariables
  Test  #30: test_util_isanypointinvolumeinterior
  Test  #31: test_util_partialdatetime
  Test  #32: test_util_datetime
  Test  #33: test_util_duration
  Test  #34: test_util_intset_parser
  Test  #35: test_util_scalarormap
  Test  #36: test_util_floatcompare
  Test  #37: test_util_compositepath
  Test  #38: test_util_stringfunctions
  Test  #39: test_util_testreference
  Test  #40: test_util_range
  Test  #41: test_mpi_mpi
  Test  #42: test_fft_multiple
  Test  #43: test_util_algorithms
  Test  #44: test_util_comparenvectors
  Test  #45: test_util_missingvalues
  Test  #46: test_util_associativecontainers
  Test  #47: test_util_propertiesofnvectors
  Test  #48: test_util_localenvironment
  Test  #49: test_util_typetraits
  Test  #50: test_util_wildcard
  Test  #51: test_util_configfunctions
  Test  #52: test_util_confighelpers
  Test  #53: test_util_timewindow
  Test  #54: test_util_arrayutil
  Test  #55: test_base_fieldsets
  Test  #56: test_util_fieldset_helpers_and_operations
  Test  #57: test_util_fieldset_subcommunicators
  Test  #58: test_util_functionspace_helpers
  Test  #59: test_util_functionspace_helpers_p2
  Test  #60: test_util_functionspace_helpers_p4
  Test  #61: test_assimilation_fullgmres
  Test  #62: test_assimilation_rotmat
  Test  #63: test_assimilation_solvematrixequation
  Test  #64: test_assimilation_spectrallmp
  Test  #65: test_assimilation_testvector3d
  Test  #66: test_assimilation_tridiagsolve
  Test  #67: vader_coding_norms
  Test  #68: saber_coding_norms_src
  Test  #69: saber_coding_norms_quench
  Test  #70: ioda_coding_norms
  Test  #71: test_ioda-engines_examples_prep_data
  Test  #72: test_ioda-engines-01-default
  Test  #73: test_ioda-engines-01-h5file
  Test  #74: test_ioda-engines-01-h5mem
  Test  #75: test_ioda-engines-01-obsstore
  Test  #76: test_ioda-engines-02-default
  Test  #77: test_ioda-engines-02-h5file
  Test  #78: test_ioda-engines-02-h5mem
  Test  #79: test_ioda-engines-02-obsstore
  Test  #80: test_ioda-engines-03-default
  Test  #81: test_ioda-engines-03-h5file
  Test  #82: test_ioda-engines-03-h5mem
  Test  #83: test_ioda-engines-03-obsstore
  Test  #84: test_ioda-engines-04-default
  Test  #85: test_ioda-engines-04-h5file
  Test  #86: test_ioda-engines-04-h5mem
  Test  #87: test_ioda-engines-04-obsstore
  Test  #88: test_ioda-engines-05a-default
  Test  #89: test_ioda-engines-05a-h5file
  Test  #90: test_ioda-engines-05a-h5mem
  Test  #91: test_ioda-engines-05a-obsstore
  Test  #92: test_ioda-engines-05b-default
  Test  #93: test_ioda-engines-05b-h5file
  Test  #94: test_ioda-engines-05b-h5mem
  Test  #95: test_ioda-engines-05b-obsstore
  Test  #96: test_ioda-engines-00-Strings-F
  Test  #97: test_ioda-engines-00-VecStrings-F
  Test  #98: test_ioda-engines-01-GroupsAndObsSpaces-F
  Test  #99: test_ioda-engines-02-Attributes-F
  Test #100: test_ioda-engines-03-Variables-F
  Test #101: test_ioda-engines-01-Py
  Test #102: test_ioda-engines-02-Py
  Test #103: test_ioda-engines-03-Py
  Test #104: test_ioda-engines-04-Py
  Test #105: test_ioda-engines-05-Py
  Test #106: test_ioda-engines-06-Py
  Test #107: test_ioda-engines-07a-Py-ObsSpaceClass
  Test #108: test_ioda-engines-07b-Py-ObsSpaceClassDataTypes
  Test #109: test_ioda-engines-chrono-Py
  Test #110: test_ioda-engines_chrono-default
  Test #111: test_ioda-engines_chrono-h5file
  Test #112: test_ioda-engines_chrono-h5mem
  Test #113: test_ioda-engines_complex_objects_array_from_struct-default
  Test #114: test_ioda-engines_complex_objects_array_from_struct-h5file
  Test #115: test_ioda-engines_complex_objects_array_from_struct-h5mem
  Test #116: test_ioda-engines_complex_objects_array_from_struct-ObsStore
  Test #117: test_ioda-engines_fixed_length_strings-default
  Test #118: test_ioda-engines_fixed_length_strings-h5file
  Test #119: test_ioda-engines_fixed_length_strings-h5mem
  Test #120: test_ioda-engines_fixed_length_strings_client-default
  Test #121: test_ioda-engines_fixed_length_strings_client-h5file
  Test #122: test_ioda-engines_fixed_length_strings_client-h5mem
  Test #123: test_ioda-engines_named_types-default
  Test #124: test_ioda-engines_named_types-h5file
  Test #125: test_ioda-engines_named_types-h5mem
  Test #126: test_ioda-engines_units
  Test #127: test_ioda-engines_basic_math
  Test #128: test_ioda-engines_variables_math
  Test #129: ioda_pyiodautils_coding_norms
  Test #130: ufo_coding_norms
  Test #131: test_femps_csgrid
  Test #132: fv3jedi_test_tier1_coding_norms
  Test #133: soca_coding_norms
  Test #134: test_dautils_ioda_example
  Test #135: iodaconv_compo_coding_norms
  Test #136: iodaconv_gsi_ncdiag_coding_norms
  Test #137: iodaconv_goes_coding_norms
  Test #138: iodaconv_hdf5_coding_norms
  Test #139: iodaconv_land_coding_norms
  Test #140: iodaconv_lib-python_coding_norms
  Test #141: iodaconv_marine_coding_norms
  Test #142: iodaconv_conventional_coding_norms
  Test #143: iodaconv_ncep_coding_norms
  Test #144: iodaconv_ssec_coding_norms
  Test #145: iodaconv_wrfda_ncdiag_coding_norms
  Test #146: iodaconv_singleob_coding_norms
  Test #147: iodaconv_mrms_coding_norms
  Test #148: iodaconv_gnssro_coding_norms
  Test #149: iodaconv_bufr_coding_norms
  Test #150: iodaconv_satbias_py_coding_norms
  Test #151: iodaconv_gsi_varbc_coding_norms
  Test #152: test_gdasapp_check_python_norms
  Test #153: test_gdasapp_check_yaml_keys
  Test #154: test_gdasapp_jedi_increment_to_fv3
  Test #155: test_gdasapp_fv3jedi_fv3inc
  Test #156: test_gdasapp_snow_create_ens
  Test #157: test_gdasapp_snow_imsproc
  Test #158: test_gdasapp_snow_apply_jediincr
  Test #159: test_gdasapp_snow_letkfoi_snowda
  Test #160: test_gdasapp_convert_bufr_adpsfc_snow
  Test #161: test_gdasapp_convert_bufr_adpsfc
  Test #162: test_gdasapp_convert_gsi_satbias
  Test #163: test_bufr2ioda_insitu_profile_argo
  Test #164: test_bufr2ioda_insitu_profile_bathy
  Test #165: test_bufr2ioda_insitu_profile_glider
  Test #166: test_bufr2ioda_insitu_profile_tesac
  Test #167: test_bufr2ioda_insitu_profile_xbtctd
  Test #168: test_bufr2ioda_insitu_surface_trkob

Total Tests: 168

Trying to figure out the source for the following tests

test_util_
test_ioda-engines_
iodaconv_
test_bufr2ioda_
danholdaway commented 1 month ago

The usual command for adding a test is ecbuild_add_test so you could try to grep for that in every CMakeLists.txt across the source code directories.

RussTreadon-NOAA commented 1 month ago

Tedious process but down to 104 ctests returned by ctest -N.

danholdaway commented 1 month ago

What's the make time at this point? Perhaps we can have a few tests being built. If the changes become convoluted tests are likely to creep back in with future code changes anyway.

RussTreadon-NOAA commented 1 month ago

What's the make time at this point? Perhaps we can have a few tests being built. If the changes become convoluted tests are likely to creep back in with future code changes anyway.

The most recent build (configure & compile) on Hercules with 104 ctests took 36:50 (minutes:seconds). develop with 1899 ctests took 37:39 to build on Hercules. It seems wrong that the two timings are basically the same.

danholdaway commented 1 month ago

All the executables for the tests can be built in parallel (any many tests rely on executables built anyway) so it's possible that what you're seeing is correct. Yes there's a lot of tests but possibly dwarfed by the number of source files at this point.

CoryMartin-NOAA commented 1 month ago

From what I can remember, the tests themselves are usually trivial to build, what can take some time are the executables that are only used for testing (mostly in UFO) but I think by just building gdas.x we can avoid this.

RussTreadon-NOAA commented 1 month ago

From what I can remember, the tests themselves are usually trivial to build, what can take some time are the executables that are only used for testing (mostly in UFO) but I think by just building gdas.x we can avoid this.

Yes, even though the number of ctests has been drastically reduced, the build/bin/ directory still contains a lot of *.x and *.py files. We don't need most of these for g-w cycling. What change(s) are need to only build the executables we need?

danholdaway commented 1 month ago

@RussTreadon-NOAA perhaps we try this a slightly different way. Inverting and renaming the flag we would have:

option( LIBRARY_ONLY_BUILD "Only build JEDI libraries and skip tests and executables" OFF )

Then switch what you've already done to be instead:

if(NOT LIBRARY_ONLY_BUILD)
  add_subdirectory( test )
endif()

Then (for example) the following file: https://github.com/JCSDA-internal/fv3-jedi/blob/develop/src/CMakeLists.txt could be:

add_subdirectory( fv3jedi )
if( NOT LIBRARY_ONLY_BUILD )
add_subdirectory( mains )

ecbuild_add_test( TARGET fv3jedi_test_tier1_coding_norms
                  TYPE SCRIPT
                  COMMAND ${CMAKE_BINARY_DIR}/bin/cpplint.py
                  ARGS --quiet --recursive ${CMAKE_CURRENT_SOURCE_DIR}
                  WORKING_DIRECTORY ${CMAKE_BINARY_DIR}/bin )
endif()                  

You can grep on 'ecbuild_add_executable' or just 'add_executable' (if the user chooses vanilla CMake) in all CMakeLists.txt files so find all the places you would wrap things, many of them will already be wrapped by what you've done. Usually there's a 'mains' directory that just needs to be wrapped in the logic.

Sorry to ask for additional work but this might improve build time and should drain out the bin directory. One caveat is that I would expect JCSDA to be more resistant to this approach since it may have limited use outside our group. We have the special case of building gdas.x whereas everyone else relies on the executables that we would be turning off.

RussTreadon-NOAA commented 1 month ago

Thank you @danholdaway for the suggestion. I'll make a new clone of feature/install and give this a try.

RussTreadon-NOAA commented 1 month ago

Complete the following in a clone of feature/install at f49e2e6.

Build completed with following timestamps

Thu Oct  3 18:57:06 UTC 2024
Building GDASApp on hercules
...
Configuring ...
Thu Oct  3 18:57:20 UTC 2024
...
Building ...
Thu Oct  3 19:01:16 UTC 2024
...
Installing ...
Thu Oct  3 19:37:03 UTC 2024
...
CMake Error at gdas/test/cmake_install.cmake:69 (file):
  file INSTALL cannot find
  "/work/noaa/da/rtreadon/git/GDASApp/install_lib/bundle/gdas/test/testinput/amsua_n19_ewok.yaml":
  No such file or directory.

The timestamp on the log file is Thu Oct 3 20:51.

Configuring took about 4 minutes. Building took around 36 minutes. Installing ran 74 minutes before hitting an error.

An install directory was created. It contains the following

(gdasapp) hercules-login-2:/work/noaa/da/rtreadon/git/GDASApp/install_lib/install$ ls
'$(PYIODA_INSTALL_LIBDIR)'        bin   include   lib64    MOM6    test
'$(PYIOODACONV_INSTALL_LIBDIR)'   doc   lib       module   share   ush

The directories have executables, libraries, module files, etc.

I need to

CoryMartin-NOAA commented 1 month ago

@RussTreadon-NOAA this might help for the ioda-converter issue: https://github.com/JCSDA-internal/ioda-converters/pull/1549

RussTreadon-NOAA commented 1 month ago

@RussTreadon-NOAA this might help for the ioda-converter issue: JCSDA-internal/ioda-converters#1549

Thanks @CoryMartin-NOAA

RussTreadon-NOAA commented 1 month ago

@RussTreadon-NOAA this might help for the ioda-converter issue: JCSDA-internal/ioda-converters#1549

Thanks @CoryMartin-NOAA

Manually added the path changes in JCSDA-internal/ioda-converters#1549 into the working copy of feature/install. '$(PYIODA_INSTALL_LIBDIR)' and '$(PYIOODACONV_INSTALL_LIBDIR)' are no longer present in the install directory.

hercules-login-3:/work/noaa/da/rtreadon/git/GDASApp/install_lib/install$ ls
MOM6  bin  doc  include  lib  lib64  module  share  test  ush
danholdaway commented 1 month ago

Install shouldn't take 74 minutes as it's usually just copying all the files from build to install path. It makes it sound like more code is being built at that time. Can build and install be one step?

cd build
ecbuild ../
make -j6 install
RussTreadon-NOAA commented 1 month ago

@danholdaway , I didn't know we can specify parallel streams on the install. build.sh just has make install.

Let me add -j 6.

danholdaway commented 1 month ago

The key also is to not be issuing make more than once. If doing install it should only be done once, from the top level.

RussTreadon-NOAA commented 1 month ago

Timings with make -j 6 for install are improved

Sat Oct  5 11:55:56 UTC 2024
Building GDASApp on hercules
...
Configuring ...
Sat Oct  5 11:56:21 UTC 2024
...
Building ...
Sat Oct  5 11:58:38 UTC 2024
...
Installing ...
Sat Oct  5 12:31:31 UTC 2024
...
Sat Oct  5 12:44:27 UTC 2024

This translates to the approximate timings below

It may be possible to reduce the build time by being more aggressive with the LIBRARY_ONLY_BUILD flag.

The above work is being done in /work/noaa/da/rtreadon/git/GDASApp/install_lib/

danholdaway commented 1 month ago

In your build.sh I don't think you need to be running the code in this block:

# Build
echo "Building ... `date`"
set -x
if [[ $BUILD_JCSDA == 'YES' ]]; then
  make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
else
  builddirs="gdas iodaconv land-imsproc land-jediincr gdas-utils bufr-query da-utils"
  for b in $builddirs; do
    cd $b
    make -j ${BUILD_JOBS:-6} VERBOSE=$BUILD_VERBOSE
    cd ../
  done
fi
set +x

This will make all the packages iodaconv land-imsproc land-jediincr gdas-utils bufr-query da-utils sequentially. Whereas the make install -j ${BUILD_JOBS:-6} done below will build everything. Doing them sequentially can be helpful if you don't intend to install the entire JEDI package. But since you do because you want to run install it doesn't actually help and could hinder. If you just run make install -j ${BUILD_JOBS:-6} then it lets CMake make maximum use of all the processors available to it. If you build just gdas and iodaconv sequentially then you can't start making the ioda executables until you're all done with gdas and gdas has one giant executable that takes a couple of minutes to build. Let the other 5 pros be working on something else while you do that.

RussTreadon-NOAA commented 1 month ago

Refactor as @danholdaway suggested. Rebuild and reinstall feature/install on Hercules with following timing

hercules-login-3:/work/noaa/da/rtreadon/git/GDASApp/install_lib$ grep "Mon Oct" build_install.log
Begin ... Mon Oct  7 15:53:31 UTC 2024
Configuring ... Mon Oct  7 15:53:41 UTC 2024
Building ... Mon Oct  7 15:56:11 UTC 2024
Installing ... Mon Oct  7 16:36:36 UTC 2024
Complete .. Mon Oct  7 16:37:53 UTC 2024

The configure took 3 minutes, The build took 40 minutes. The install took 1 minute, 17 seconds.

ctest -N returns 127 tests. install/bin contains a total of 175 files: 78 *.x, 87 *.py, 11 miscellaneous files. I think we should see fewer files in install/bin.

danholdaway commented 1 month ago

That sounds reasonable Russ. If you have time it would be good to know if the library only build makes a difference in this mode of running. And perhaps whether increasing the number of cores makes much difference.

RussTreadon-NOAA commented 1 month ago

Below are timings for develop at 9d95c9d and feature/install at 9d95c9d with modifications to CMakeLists.txt as noted above.

Notes:

make -j 6 develop feature/install
configure 03:29 02:23
build 35:37 39:59
install 00:24
total 39: 21 43:10
make -j 8 develop feature/install
configure 06:37 02:14
build 29:00 31:16
install 00:27
total 35:41 33:59
make -j 12 develop feature/install
configure 07:08 01:59
build 26:31 25:46
install 00:48
total 33:50 28:38
make -j 16 develop feature/install
configure 04:38 02:40
build 24:59 27:35
install 00:32
total 29:50 30:56
make -j 20 develop feature/install
configure 03:20 02:07
build 24:18 28:09
install 01:47
total 27:50 32:07

Attempts using make -j 24 failed with

icpc: error #10106: Fatal error in /apps/spack-managed/gcc-11.3.1/intel-oneapi-compilers-2023.1.0-sb753366rvywq75zeg4ml5k5c72xgj72/comp\
iler/2023.1.0/linux/bin/intel64/../../bin/intel64/mcpcom, terminated by kill signal
compilation aborted for /work/noaa/da/rtreadon/git/GDASApp/install_lib/bundle/fv3-jedi/src/mains/fv3jediControlPert.cc (code 1)
make[2]: *** [fv3-jedi/src/mains/CMakeFiles/fv3jedi_controlpert.x.dir/build.make:76: fv3-jedi/src/mains/CMakeFiles/fv3jedi_controlpert.\
x.dir/fv3jediControlPert.cc.o] Error 1
make[2]: Leaving directory '/work/noaa/da/rtreadon/git/GDASApp/install_lib/build'
make[1]: *** [CMakeFiles/Makefile2:8129: fv3-jedi/src/mains/CMakeFiles/fv3jedi_controlpert.x.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
CoryMartin-NOAA commented 1 month ago

why are the build times always longer in the feature branch than develop? Is it because it is building everything vs just some things?

RussTreadon-NOAA commented 1 month ago

@CoryMartin-NOAA , I think you are right. The flip side of the faster selective build with develop is that the install time is much longer.

I replaced three references to rossrad.dat in sorc/soca with rossrad.nc. After this, install works with develop. The install takes a long time because jedi component ctests remain active in develop. install builds the scripts and executables needed to run these tests. ctest -N in the develop build directory returns 2003 tests with WORKLOW_BUILD=OFF. The feature branch turns off most tests. Only 127 remain with WORKFLOW_BUILD=OFF`.

We're caught between

  1. faster selective build and slower install with develop
  2. slower full build and faster install with feature/install

2 comes at the cost of modifying CMakeLists.txt for most jedi components in sorc/. I can see that even if we can commit these CMakeLists.txt changes to various jedi repos, we will need to keep an eye on future jedi hashes to ensure new or existing tests don't wind up outside the _notest flag regions.

I could test option 3 - fast selective build using feature/install along with faster install.

CoryMartin-NOAA commented 1 month ago

My hunch is that the slow install is because of the selective build, can you do a ./build.sh -a on develop? does that build everything (it's supposed to)

RussTreadon-NOAA commented 1 month ago

Made the following local modifications in develop at 9d95c9d on Hercules to enable successful install

Execute ./build.sh -f -a -p /work/noaa/da/rtreadon/git/GDASApp/test_install/install. Run make with 20 cores. Configure, build, and install successfully ran to completion. Timings are below

make -j 20 develop feature/install develop with install
configure 03:20 02:07 12:34
build 24:18 28:09 33:17
install 01:47 02:29
total 27:50 32:07 48:27
CoryMartin-NOAA commented 1 month ago

12:34 to run configure? wow!

danholdaway commented 1 month ago

12:34 to run configure? wow!

Strange. Is that consistent or was the machine just struggling at that moment? Does the feature branch have a difference in the configure?

danholdaway commented 1 month ago

Thanks so much for going through the pains of testing and comparing all these ways of building and installing @RussTreadon-NOAA, tremendously helpful to see all this. What seems to pop out to me is that unfortunately a library only/no tests build doesn't really save all that much time in installing JEDI. JEDI has just become a behemoth of source code that takes an age to compile, and doesn't scale particularly well with processors. Note that the shared drives of HPCs may also not be the best place to see the fastest make times and may explain why the time even started to increase with more processors.

And even with quite a bit of work it wasn't possible to turn off all the tests or prevent the bin directory from filling up with things. So ultimately this may not even really satisfy NCO's wishes to have empty or at least clean directories. It seems we would need (possibly a lot) more work to fully eliminate all tests and bin directory copies. It also may be never ending because little would prevent folks from putting tests/executables outside of the fences that we'd create in the CMake files. What do you all think, is this a fair assessment of what we're seeing?

RussTreadon-NOAA commented 1 month ago

The 12:34 looks to be anomalous. I reran and configure took 04:21. I ran again and configure took 03:24. I'm working on Hercules login nodes. Variations in the login node load would impact configure and build timings, right?

RussTreadon-NOAA commented 1 month ago

@danholdaway , I agree with your assessment.

Your last point is a major concern. Even if we complete the task of trimming down the configure, compile, and install to satisfy EE2 requirements, maintenance of this set up requires constant vigilance. It's not hard to imaging developers adding new tests or executables outside the blocked sections we added for EE2 compliance.

The module approach get us closer to EE2 compliance. It does so, however, at the cost of not being development friendly. Assuming JEDI modules are released via spack-stack, developers are limited to what's in the stack unless they install their own jedi modules and adjust the GDASApp build accordingly.

RussTreadon-NOAA commented 1 month ago

Even though configure, compile, and install take more than 30 minutes, does the final install directory move us closer to EE2 compliance? The install option populates the user specified INSTALL_PREFIX with directories

hercules-login-4:/work/noaa/da/rtreadon/git/GDASApp/test_install/install$ ls
MOM6  bin  doc  include  lib  lib64  module  share  test  ush

bin/ has 217 files. Most of these files we don't need for GFS v17 or v18. There may be other files in other install/ directories that we don't need for operations. Can we add scripting to build.sh to remove unnecessary files?

We do this with the operational GSI build. gsi.fd contains extra stuff which operations does not need. Script ush/build_4nco_global.sh not only builds operational executables. It also removes extraneous directories, moves the install directory to the desired operational location, and removes the build directory.

Should we develop a similar build_ops.sh script for GDASApp?

danholdaway commented 1 month ago

Having the install is definitely needed as that eliminates the need to link the executables. We can just install to the GFSHOME directory and point to the executable there. I'm fine with adding to the script to keep only bin/gdas* files.

RussTreadon-NOAA commented 1 month ago

Thank you @danholdaway for your comment. Enabling install requires minor updates to GDASApp, ioda-converters, and soca. I'll open issues to work on these updates.

danholdaway commented 1 month ago

Thanks @RussTreadon-NOAA