ESMCI / ccs_config_cesm

CESM CIME Case Control System configuration files
3 stars 46 forks source link

Problems with using openmpi and intel on cheyenne #26

Closed ekluzek closed 2 years ago

ekluzek commented 2 years ago

I get the following error when trying to build a case on cheyenne_intel under ctsm5.1.dev090 with ccs_config_cesm0.0.21 (or 15 as well) for the case...

SMS_D_Ld3.f10_f10_mg37.I1850Clm50BgcCrop.cheyenne_intel.clm-default ./xmlchange MPILIB=openmpi

ERROR: module command /glade/u/apps/ch/opt/lmod/7.5.3/lmod/lmod/libexec/lmod python load esmf-8.2.0b23-ncdfio-mpt-g ncarcompilers/0.5.0 pio/2.5.6d failed with message: Lmod has detected the following error: These module(s) exist but cannot be loaded as requested: "pio/2.5.6d" Try: "module spider pio/2.5.6d" to see how to load the module(s).

ekluzek commented 2 years ago

After running ./case.setup --reset I see this error

ERROR: Could not find a matching MPI for attributes: {'compiler': 'intel', 'mpilib': 'openmpi', 'threaded': False}

Because there's not a setting for intel with openmpi.

I also realized the right way to do this is to set openmpi from the get go using this testname..

SMS_Mopenmpi_D_Ld3.f10_f10_mg37.I1850Clm50BgcCrop.cheyenne_intel.clm-default

After adding a block for openmpi with intel as follows, the module load seems to work, but fails in building share...

diff --git a/machines/config_machines.xml b/machines/config_machines.xml
index 44ff8e4..377494d 100644
--- a/machines/config_machines.xml
+++ b/machines/config_machines.xml
@@ -810,6 +810,11 @@ This allows using a different mpirun command to launch unit tests
         <command name="load">netcdf-mpi/4.8.1</command>
         <command name="load">pnetcdf/1.12.2</command>
       </modules>
+      <modules mpilib="openmpi" compiler="intel">
+        <command name="load">openmpi/4.1.1</command>
+        <command name="load">netcdf-mpi/4.8.1</command>
+        <command name="load">pnetcdf/1.12.2</command>
+      </modules>
       <modules mpilib="openmpi" compiler="nvhpc">
         <command name="load">openmpi/4.1.1</command>
         <command name="load">netcdf-mpi/4.8.1</command>

It then fails at the link step with the following error...

/usr/lib64/gcc/x86_64-suse-linux/4.8/../../../../x86_64-suse-linux/bin/ld: cannot find -lmpi++

ekluzek commented 2 years ago

OK, I got it to build by adding the following...

diff --git a/machines/config_machines.xml b/machines/config_machines.xml
index 44ff8e4..a6cd78e 100644
--- a/machines/config_machines.xml
+++ b/machines/config_machines.xml
@@ -698,14 +698,22 @@ This allows using a different mpirun command to launch unit tests
       <modules compiler="nvhpc">
         <command name="load">nvhpc/22.2</command>
       </modules>
-      <modules compiler="intel" mpilib="!mpi-serial" DEBUG="TRUE">
+      <modules compiler="intel" mpilib="mpt" DEBUG="TRUE">
         <command name="use">/glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/intel/19.1.1/</command>
         <command name="load">esmf-8.2.0b23-ncdfio-mpt-g</command>
       </modules>
-      <modules compiler="intel" mpilib="!mpi-serial" DEBUG="FALSE">
+      <modules compiler="intel" mpilib="mpt" DEBUG="FALSE">
         <command name="use">/glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/intel/19.1.1/</command>
         <command name="load">esmf-8.2.0b23-ncdfio-mpt-O</command>
       </modules> 
+      <modules compiler="intel" mpilib="openmpi" DEBUG="TRUE">
+        <command name="use">/glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/intel/19.1.1/</command>
+        <command name="load">esmf-8.2.0b23-ncdfio-openmpi-g</command>
+      </modules>
+      <modules compiler="intel" mpilib="openmpi" DEBUG="FALSE">
+        <command name="use">/glade/p/cesmdata/cseg/PROGS/modulefiles/esmfpkgs/intel/19.1.1/</command>
+        <command name="load">esmf-8.2.0b23-ncdfio-openmpi-O</command>
+      </modules> 
       <modules mpilib="mpi-serial">
         <command name="load">mpi-serial/2.3.0</command>
       </modules>