RMGDFT / rmgdft

RMG is an Open Source code for electronic structure calculations and modeling of materials and molecules. It is based on density functional theory and uses a real space basis and pseudopotentials.
GNU General Public License v2.0
47 stars 11 forks source link

Unable to build on Summit #30

Closed prckent closed 5 years ago

prckent commented 5 years ago

I am unable to build RMG reliably, sometimes at all on Summit. Errors occur during make. Tests were done today, 9 September 2019. It looks like there is a bad dependency or a workaround might be needed for Summit filesystem weirdness.

Reproducer:

# This is for demonstration only
rm -r -f rmgdft # Remove git repo since we are going to build inside it per the instructions (ugly)
git clone https://github.com/RMGDFT/rmgdft.git
cd rmgdft
mkdir build_summit_gpu
cd build_summit_gpu
module load gcc/6.4.0
export FC=`which gfortan`
export CC=`which gcc`
export CXX=`which g++`
module load boost
module load essl
module load cuda
module load fftw
module load hdf5 # Missing from Summit instructions
module load cmake/3.14.2
export BLA_VENDOR=IBMESSL
cmake -DRMG_GPU_ENABLED=1 -DBLAS_blas_LIBRARY=/sw/summit/essl/6.1.0-2/essl/6.1/lib64/libessl.so ..
nice make -j 32
module list
ls -l ../rmg-gpu

Output:

[  0%] Building Fortran object TDDFT/ELDYN/CMakeFiles/eldyn_mod.dir/timing.f90.o
Fatal Error: Can't delete temporary module file 'timing.mod0': No such file or directory
make[2]: *** [TDDFT/ELDYN/CMakeFiles/eldyn_mod.dir/timing.f90.o] Error 1
make[2]: *** Waiting for unfinished jobs....
elbriggs commented 5 years ago

I'll take a look. I built it last week with no issues but perhaps something changed with the software stack since then.

prckent commented 5 years ago

Thanks. I'll be happy to disable TDDFT if that is possible. Just trying to get a working build.

elbriggs commented 5 years ago

This is a bit odd but I got a successful build by just running make again after getting an error the first time. I did have to load the hdf5 module so I'll update the Summit instructions with that but why the second make works after the first one fails is still unclear.

P.S. The build directory does not need to actually be located inside the source directory just as long as the actual path to CMakeLists.txt is passed to cmake.

elbriggs commented 5 years ago

I noticed this error message which occurred during the cmake phase before the message about eldyn_mod.dir

[elbriggs@login1.summit build_rmg_gpu]$ cmake -DRMG_GPU_ENABLED=1 -DBLAS_blas_LIBRARY=/sw/summit/essl/6.1.0-2/essl/6.1/lib64/libessl.so ../rmgdft/ fatal: Not a git repository (or any parent up to mount point /autofs/nccs-svm1_home1) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

The CMakeLists.txt file uses git to extract version/patch level information to be compiled into the binary but apparently this does not work reliably on Summit. It's not clear if this is the root cause which makes running make twice necessary but it's clearly a problem in that the correct version/patch information won't get included in the executable.

elbriggs commented 5 years ago

The TDDFT error on the inital make goes away if you reduce the number of cores used in the parallel make. This is definitely a bug but it's not clear where it's coming from but using make -j8 instead of make -j32 seems to be more reliable.

prckent commented 5 years ago

The -j settings sensitivity indicates a missing dependency.

elbriggs commented 5 years ago

Yep. Fixed now in latest commit. Guess it's been there for a while but never actually saw it before.