ESMCI / cime

Common Infrastructure for Modeling the Earth
http://esmci.github.io/cime
Other
161 stars 206 forks source link

remove SMP_PRESENT and replace with BUILD_THREADED #4546

Closed jedwards4b closed 9 months ago

jedwards4b commented 9 months ago

SMP_PRESENT and BUILD_THREADED were both used in the same way and having two variables to do the same thing is confusing at best. This replaces all instances of SMP_PRESENT with BUILD_THREADED. Note that this may require changes in E3SM outside of CIME.

Test suite: scripts_regression_tests.py Test baseline: Test namelist changes: Test status: bit for bit Fixes #4544 User interface changes?: xml variable SMP_PRESENT is removed.

Update gh-pages html (Y/N)?:

jedwards4b commented 9 months ago

system-testing expected to fail

amametjanov commented 9 months ago

@jgfouca, will there be a follow-up PR to fix things in E3SM? Just wanted to check that BUILD_THREADED was set F->T->F with NTHREADS changing 1->2->1: noted in https://github.com/ESMCI/cime/pull/2466#issuecomment-380939087 .

azamat@perlmutter:login09:~/saul/E3SM
> cd cime && git branch && cd -
  master
* replace_smp_present
azamat@perlmutter:login09:~/saul/E3SM
> ./cime/scripts/create_test SMS.f19_g16.X --no-build -t 20231218-chk-cime
Using project from .cesm_proj: e3sm
create_test will do up to 1 tasks simultaneously
create_test will use up to 320 cores simultaneously
Creating test directory /pscratch/sd/a/azamat/e3sm_scratch/pm-cpu/SMS.f19_g16.X.pm-cpu_intel.20231218-chk-cime
RUNNING TESTS:
  SMS.f19_g16.X.pm-cpu_intel
Starting CREATE_NEWCASE for test SMS.f19_g16.X.pm-cpu_intel with 1 procs
Finished CREATE_NEWCASE for test SMS.f19_g16.X.pm-cpu_intel in 5.232196 seconds (PASS)
Starting XML for test SMS.f19_g16.X.pm-cpu_intel with 1 procs
Finished XML for test SMS.f19_g16.X.pm-cpu_intel in 0.363084 seconds (PASS)
Starting SETUP for test SMS.f19_g16.X.pm-cpu_intel with 1 procs
Finished SETUP for test SMS.f19_g16.X.pm-cpu_intel in 1.173132 seconds (FAIL). [COMPLETED 1 of 1]
    Case dir: /pscratch/sd/a/azamat/e3sm_scratch/pm-cpu/SMS.f19_g16.X.pm-cpu_intel.20231218-chk-cime
    Errors were:
        ERROR: No variable BUILD_THREADED found in case
jedwards4b commented 9 months ago

@amametjanov you need to update mct/cime_config/config_component.xml

jedwards4b commented 9 months ago

@amametjanov thank you for confirming.

jedwards4b commented 9 months ago

@jasonb5 do i need to fix something here or is the ball in your court?

jasonb5 commented 9 months ago

@jedwards4b It's in mine, I'll restart the tests shortly.

jasonb5 commented 9 months ago

@jedwards4b ccs_config needs to be updated in the CESM repo.

jedwards4b commented 9 months ago

@jasonb5 The file Externals.cfg that is used by docker needs to be updated.

jasonb5 commented 9 months ago

@jedwards4b I updated the container to use Externals.cfg from the CIME repo for both sys and unit tests. The remaining errors look related to CDEPS.

SMS.f19_g16_rx1.A.docker_gnu.fake_testing_only_20231220_171621/bld/gnu/openmpi/nodebug/nothreads/nuopc/CDEPS', error=/__w/cime/cime/cime/components/cdeps/streams/dshr_stream_mod.F90:1714:28:

 1714 |     use shr_file_mod, only : shr_file_get_real_path
      |                            1
Error: Symbol 'shr_file_get_real_path' referenced at (1) not found in module 'shr_file_mod'
make[2]: *** [streams/CMakeFiles/streams.dir/build.make:101: streams/CMakeFiles/streams.dir/dshr_stream_mod.F90.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:375: streams/CMakeFiles/streams.dir/all] Error 2
make: *** [Makefile:124: all] Error 2
jedwards4b commented 9 months ago

I've been trying to figure that out.

 git blame shr_file_mod.F90 | grep shr_file_get_ 
961411df src/shr_file_mod.F90  (Jim Edwards    2023-10-27 13:23:12 -0600   64)   public :: shr_file_get_real_path ! Get a fully resolved path
961411df src/shr_file_mod.F90  (Jim Edwards    2023-10-27 13:23:12 -0600 1010)   subroutine shr_file_get_real_path(path, resolved_path)
961411df src/shr_file_mod.F90  (Jim Edwards    2023-10-27 13:23:12 -0600 1043)   end subroutine shr_file_get_real_path
jedwards4b commented 9 months ago

Ah - the share should be share1.0.18 - we had it set to 17

jedwards4b commented 9 months ago

The e3sm test is expected to fail because the mct driver config_compset.xml file needs to be changed to use BUILD_THREADED instead of SMP_PRESENT.