Closed jgfouca closed 7 years ago
@brhillman I am unable to reproduce this error:
% module list
1) sierra-python/2.7.4
% create_test cime_tiny --no-build
% cd $CASEROOT
% source .env_mach_specific.sh
% echo $LD_LIBRARY_PATH | tr : '\n'
/projects/ccsm/tpl/netcdf/4.3.2/intel/13.0.1/openmpi/1.6.5/lib
/projects/ccsm/tpl/netcdf/4.3.2/intel/13.0.1/openmpi/1.6.5/lib
/projects/global/x86_64/compilers/intel/intel-compxe-2015.2.164/lib/intel64
/projects/global/x86_64/compilers/intel/intel-compxe-2015.2.164/mkl/lib/intel64
/opt/openmpi-1.6-intel/lib
/projects/global/x86_64/compilers/intel/composer_xe_2015.3.187/compiler/lib/intel64
/opt/rh/devtoolset-3/root/usr/lib64
/opt/rh/devtoolset-3/root/usr/lib/gcc/x86_64-redhat-linux/4.9.2
/projects/sems/install/capacity-hpc/sems/utility/cmake/3.5.2/lib
/projects/sems/install/capacity-hpc/sems/compiler/python/2.7.9/lib
/projects/sems/install/capacity-hpc/sems/utility/git/2.10.1/lib
/usr/netpub/graphviz/graphviz-2.30.1/lib
@jgfouca this still fails for me, although not in the testing environment:
% module load sems-python/2.7.9 % cd /home/bhillma/codes/acme/ACME-master/cime/tools/mapping/gen_domain_files/src % /home/bhillma/codes/acme/ACME-master/cime/tools/configure --machine skybridge --macros-format Makefile % source .env_mach_specific.sh % echo $LD_LIBARY_PATH /projects/ccsm/tpl/netcdf/4.3.2/intel/13.0.1/openmpi/1.6.5/lib:/projects/sems/install/capacity-hpc/sems/compiler/python/2.7.9/lib:/opt/openmpi-1.6-intel/lib:/projects/global/x86_64/compilers/intel/composer_xe_2015.3.187/compiler/lib/intel64
No mkl libraries listed, and if I compile and then try to run the gen_domain tool I get an error loading shared libraries. So something may be different in the test environment you created? I did repeat your steps, and the .env_mach_specific.sh from the test CASEROOT does have the mkl libraries explicitly listed.
just a guess -- when i tried to build gen_domain on skybridge, i used module load mkl/14.0
@oksanaguba yes, manually re-loading the modules after sourcing the env_mach_specific that is created appends to LD_LIBRARY_PATH and serves as a work-around for me, allowing me to build and run. I brought up the issue because this seems like something that should be set properly by the env_mach_specific that is created here.
i have certain things in my bash_profile, so, i tried to follow your path, first, with 'module purge', and got an error trying to load module load sems-python/2.7.9
could it be that Jim has something in his profile, so, for him it works?
@oksanaguba module purge seems to remove the configuration to allow loading the sems modules. I see Jim is using sierra-python/2.7.4, so I tried repeating the above using
% module purge % module load sierra-python/2.7.4
and then fail at the configure step with
ERROR: Undefined env var 'LD_LIBRARY_PATH'
@brhillman if you aren't using the env_mach_specific from the CASEROOT, which one are you using?
@jgfouca the .env_mach_specific.sh created by running configure explicitly. This is following the instructions in the README for gen_domain_files. The command is
/home/bhillma/codes/acme/ACME-master/cime/tools/configure --machine skybridge --macros-format Makefile
This creates an .env_mach_specific.sh in your current path, but does not set LD_LIBRARY_PATH correctly.
@brhillman OK, I see the problem now. I'm looking into a fix.
It looks like LD_LIBRARY_PATH isn't getting set right in CIME, and this seems to be what has been causing my issues. In the config_machines.xml entries for redsky and skybridge, there are lines to load specific intel and openmpi, and the intel-mkl modules. Below that, the environment variables are set by hand for PATH and LD_LIBRARY_PATH, but these do not include the paths for intel, openmpi, or intel-mkl, and the $ENV{LD_LIBRARY_PATH} at the end does not seem to work, maybe because these modules are loaded when that variable is expanded? When .env_mach_specific.sh is created, the paths are not appended to the end. If I edit .env_mach_specific.sh and append :$LD_LIBRARY_PATH to the end, then source that, it works, or if I load the right modules by hand ahead of time it works. So if one happens to have the right modules loaded by default (or in ~/.bashrc), there isn't a problem, and maybe it doesn't affect ACME builds because they do not try to use the mkl library. Regardless, it seems like something isn't quite right in the CIME configuration. The workaround for me was to load the modules before trying to run the CIME scripts, so that my environment was set up how CIME tried to set it up, but this does not seem like a very robust solution.
Filed on behalf of @brhillman