ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
305 stars 309 forks source link

Single Site Test Attempt on Casper (fails) #2293

Open rgknox opened 9 months ago

rgknox commented 9 months ago

I attempted to build a single site test on casper and ran into some errors. I've never used this machine before today mind you, but I just heard that it is the ideal machine to run single site runs, so I gave it a quick test.

I have no special environment variables or anything other than the default modules loaded. I executed the create_test in an interactive queue using execcasper, and I also gave it one try on the login node (with the same error).

This uses the following tags: ctsm: ctsm5.1.dev159
fates: sci.1.69.0_api.31.0.0

./create_test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold --generate /glade/derecho/scratch/rgknox/ctsm5.1.dev159-sci.1.69.0_api.31.0.0 --project P93300041 -o

Testnames: ['SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold']
create_test will do up to 1 tasks simultaneously
create_test will use up to 45 cores simultaneously
Creating test directory /glade/scratch/rgknox/SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold.G.20231215_101759_3lofzp
RUNNING TESTS:
  SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold
Starting CREATE_NEWCASE for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold with 1 procs
Finished CREATE_NEWCASE for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold in 2.681000 seconds (PASS)
Starting XML for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold with 1 procs
Finished XML for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold in 0.331622 seconds (PASS)
Starting SETUP for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold with 1 procs
Finished SETUP for test SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold in 0.518615 seconds (FAIL). [COMPLETED 1 of 1]
    Case dir: /glade/scratch/rgknox/SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold.G.20231215_101759_3lofzp
    Errors were:
        ERROR: module command /glade/u/apps/dav/opt/lmod/7.7.29/libexec/lmod python purge  failed with message:
        /glade/u/apps/dav/opt/lua/5.3.4/bin/lua: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or directory

Waiting for tests to finish
FAIL SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold (phase SETUP)
    Case dir: /glade/scratch/rgknox/SMS_Lm13.1x1_brazil.I2000Clm50FatesCruRsGs.casper_nvhpc.clm-FatesCold.G.20231215_101759_3lofzp
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
test-scheduler took 4.831060886383057 seconds
ekluzek commented 9 months ago

@rgknox try the intel compiler does that fail as well?

rgknox commented 9 months ago

yes, appears to be the same error as well:

ERROR: module command /glade/u/apps/dav/opt/lmod/7.7.29/libexec/lmod python purge failed with message: /glade/u/apps/dav/opt/lua/5.3.4/bin/lua: error while loading shared libraries: libreadline.so.6: cannot open shared object file: No such file or director

ekluzek commented 9 months ago

This is with ccs_config_cesm0.0.84. It looks like work on casper went into ccs_config_cesm0.0.87, so I'll try with that.

ekluzek commented 9 months ago

OK, that doesn't work out of the box. It might need a change in both ccs_config and in cime.

https://github.com/ESMCI/ccs_config_cesm/issues/138