ESMCI / ccs_config_cesm

CESM CIME Case Control System configuration files
3 stars 41 forks source link

Simple mpi-serial case on Casper failing in setup #143

Open ekluzek opened 5 months ago

ekluzek commented 5 months ago

Hello, trying to run this test

SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc

With these externals in ctsm5.1.dev158

diff --git a/Externals.cfg b/Externals.cfg
index a17f8e2ec..b29af5c64 100644
--- a/Externals.cfg
+++ b/Externals.cfg
@@ -34,7 +34,7 @@ hash = 34723c2
 required = True

 [ccs_config]
-tag = ccs_config_cesm0.0.84
+tag = ccs_config_cesm0.0.87
 protocol = git
 repo_url = https://github.com/ESMCI/ccs_config_cesm.git
 local_path = ccs_config
@@ -44,11 +44,11 @@ required = True
 local_path = cime
 protocol = git
 repo_url = https://github.com/ESMCI/cime
-tag = cime6.0.175
+tag = cime6.0.198
 required = True

 [cmeps]
-tag = cmeps0.14.43
+tag = cmeps0.14.47
 protocol = git
 repo_url = https://github.com/ESCOMP/CMEPS.git
 local_path = components/cmeps

fails for me as follows.

qcmd -- ./create_test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc -r . 
Waiting on job launch; 9351378.casper-pbs with qsub arguments:
    qsub  -l select=1:ncpus=1:mem=10GB -A P93300606 -q casper@casper-pbs -l walltime=01:00:00

Warning: no access to tty (Inappropriate ioctl for device).
Thus no job control in this shell.
Testnames: ['SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc']
Using project from .cesm_proj: P93300041
create_test will do up to 1 tasks simultaneously
create_test will use up to 45 cores simultaneously
Creating test directory /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi
RUNNING TESTS:
SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc
Starting CREATE_NEWCASE for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc with 1 procs
Finished CREATE_NEWCASE for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc in 186.715961 seconds (PASS)
Starting XML for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc with 1 procs
Finished XML for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc in 119.385811 seconds (PASS)
Starting SETUP for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc with 1 procs
Finished SETUP for test SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc in 1.602533 seconds (FAIL). [COMPLETED 1 of 1]
Case dir: /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi
Errors were:
ERROR: module command /glade/u/apps/casper/23.10/spack/opt/spack/lmod/8.7.24/gcc/7.5.0/m4jx/lmod/lmod/libexec/lmod python load ncarenv/23.10 cmake/3.26.3 intel/2023.2.1 mkl/2023.2.0 netcdf/4.9.2 ncarcompilers/0.5.0 parallelio/2.6.2 esmf/8.5.0 ncarcompilers/1.0.0 failed with message:
Lmod has detected the following error: The following module(s) are unknown:
"ncarcompilers/0.5.0"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
$ module --ignore_cache load "ncarcompilers/0.5.0"

Also make sure that all modulefiles written in TCL start with the string
#%Module

Waiting for tests to finish
FAIL SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc (phase SETUP)
Case dir: /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
test-scheduler took 380.3022334575653 seconds
casper-login1 cime/scripts> cd SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi/
Directory: /glade/work/erik/ctsm_worktrees/external_updates/cime/scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi
casper-login1 scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi> cat TestStatus
PASS SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc CREATE_NEWCASE
PASS SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc XML
FAIL SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc SETUP
casper-login1 scripts/SMS_D_Lm1_Mmpi-serial.CLM_USRDAT.I1PtClm50SpRs.casper_intel.clm-USUMB_nuopc.20240112_170555_qa7smi> ./case.setup 
ERROR: module command /glade/u/apps/casper/23.10/spack/opt/spack/lmod/8.7.24/gcc/7.5.0/m4jx/lmod/lmod/libexec/lmod python load ncarenv/23.10 cmake/3.26.3 intel/2023.2.1 mkl/2023.2.0 netcdf/4.9.2 ncarcompilers/0.5.0 parallelio/2.6.2 esmf/8.5.0 ncarcompilers/1.0.0 failed with message:
Lmod has detected the following error: The following module(s) are unknown: "ncarcompilers/0.5.0"

Please check the spelling or version number. Also try "module spider ..."
It is also possible your cache file is out-of-date; it may help to try:
  $ module --ignore_cache load "ncarcompilers/0.5.0"

Also make sure that all modulefiles written in TCL start with the string #%Module
ekluzek commented 5 months ago

We also saw this earlier with older externals as documented in CTSM:

https://github.com/ESCOMP/CTSM/issues/2293