E3SM-Project / e3sm-unified

A metapackage for a unified anaconda environment for analyzing results from the Energy Exascale Earth System Model (E3SM).
BSD 3-Clause "New" or "Revised" License
8 stars 8 forks source link

Running create_test fails with v1.9.0 #110

Closed beharrop closed 11 months ago

beharrop commented 11 months ago

If I have the latest e3sm_unified environment loaded on perlmutter, I get failures running create_test that I don't get when I do not have that environment loaded. Is this expected from how e3sm_unified was put together?

I am using 2236937c71ab0c4ef67c2574e58e01a0e46714d8 hash for the E3SM code and am running ./create_test SMS_Ln5.ne4pg2_oQU480.F2010 from cime/scripts/

If I have not loaded e3sm_unified, everything passes. If I load e3sm_unified v1.9.0, I get the following

Using project from config_machines.xml: e3sm
create_test will do up to 1 tasks simultaneously
create_test will use up to 320 cores simultaneously
Creating test directory /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz
RUNNING TESTS:
  SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel
Starting CREATE_NEWCASE for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel with 1 procs
Finished CREATE_NEWCASE for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel in 1.424639 seconds (PASS)
Starting XML for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel with 1 procs
copying /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/env_run.xml -> /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/LockedFiles/env_run.orig.xml
Finished XML for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel in 0.518013 seconds (PASS)
Starting SETUP for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel with 1 procs
Finished SETUP for test SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel in 7.509040 seconds (FAIL). [COMPLETED 1 of 1]
    Case dir: /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz
    Errors were:
        ERROR: Command: '/pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld/build-namelist -infile /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/Buildconf/elmconf/namelist  -csmdata /global/cfs/cdirs/e3sm/inputdata -inputdata /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/Buildconf/elm.input_data_list -ignore_ic_year -namelist " &elm_inparm  start_ymd=00010101  /" -use_case 2010_CMIP6_control  -res ne4np4.pg2  -clm_start_type default -envxml_dir /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz -l_ncpl 24 -r_ncpl 6 -lnd_frac /global/cfs/cdirs/e3sm/inputdata/share/domains/domain.lnd.ne4pg2_oQU480.200527.nc -glc_nec 0 -co2_ppmv 388.717 -co2_type diagnostic  -ncpl_base_period day  -config /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/Buildconf/elmconf/config_cache.xml -bgc sp -mask oQU480' failed with error 'Can't locate XML/LibXML.pm in @INC (you may need to install the XML::LibXML module) (@INC contains: /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/cime/utils/perl5lib /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/cime/utils/perl5lib/Config/ /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld /global/cfs/cdirs/e3sm/perl/lib/perl5-only-switch/x86_64-linux-thread-multi /global/cfs/cdirs/e3sm/perl/lib/perl5-only-switch /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/5.32/site_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/site_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/5.32/vendor_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/vendor_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/5.32/core_perl /global/common/software/e3sm/anaconda_envs/base/envs/e3sm_unified_1.9.0_login/lib/perl5/core_perl .) at /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/cime/utils/perl5lib/Config/SetupTools.pm line 5.
        BEGIN failed--compilation aborted at /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/cime/utils/perl5lib/Config/SetupTools.pm line 5.
        Compilation failed in require at /pscratch/sd/b/beharrop/temp/e3sm_code/20231016/E3SM/components/elm/bld/ELMBuildNamelist.pm line 440.' from dir '/pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz/Buildconf/elmconf'

Waiting for tests to finish
FAIL SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel (phase SETUP)
    Case dir: /pscratch/sd/b/beharrop/e3sm_scratch/pm-cpu/SMS_Ln5.ne4pg2_oQU480.F2010.pm-cpu_intel.20231016_140619_w61kkz
Due to presence of batch system, create_test will exit before tests are complete.
To force create_test to wait for full completion, use --wait
test-scheduler took 10.824334383010864 seconds

Just for fun, I checked if I could run this with e3sm_unified v1.8.1, and everything passes again. Was this an expected change going from v1.8.1 to v1.9.0?

Also, for my own best practice cheat sheet, should we not have the unified environment loaded to setup or build the model?

xylar commented 11 months ago

@beharrop, my recommendation would be not to try to do anything with E3SM itself using E3SM-Unified. E3SM-Unified is meant for pre- and post-processing of E3SM output, not for interacting with CIME. I suggest using the simpler system python environment you get on Perlmutter by doing module load python.

xylar commented 11 months ago

I believe the specific issue is that the version of perl in E3SM-Unified is not compatible with CIME, but that's just a hunch.

xylar commented 11 months ago

Also, for my own best practice cheat sheet, should we not have the unified environment loaded to setup or build the model?

Yes, that's exactly right.

xylar commented 11 months ago

Just for fun, I checked if I could run this with e3sm_unified v1.8.1, and everything passes again. Was this an expected change going from v1.8.1 to v1.9.0?

Not exactly an expected change but it doesn't seem surprising that there would be differences -- a lot of packages are involved in E3SM-Unified (~1000) and a lot of them change between versions. It seems like perl might have been one of them.

beharrop commented 11 months ago

Thanks @xylar . I'll go ahead and just stick with the default python module moving forward.