RMGDFT / rmgdft

RMG is an Open Source code for electronic structure calculations and modeling of materials and molecules. It is based on density functional theory and uses a real space basis and pseudopotentials.
GNU General Public License v2.0
47 stars 11 forks source link

How to configure builtin scalapack compilation #41

Closed ketancmaheshwari closed 2 years ago

ketancmaheshwari commented 5 years ago

I am trying to build the latest beta release here at CADES ORNL but getting undefined symbol errors when make tries the final linking of rmg-cpu that all seems to be coming from bundled scalapack. It seems like scalapack is being built with a different parameter set than the rest of the code.

I am looking for a way to pass the right compilation parameters to scalapack OR avoid building the scalapack and provide the system built scalapack but could not find a suitable option on cmake variables.

WenchangLu commented 5 years ago

We have a few issues on CADES. You can contact with David Lingerfelt (lingerfeltdb@ornl.gov) who is using our code on CADES and he may help you in building the code. I believe the problem is due to the linked libraries.

ketancmaheshwari commented 5 years ago

I received the following from David: The RMG devel team was visiting at the time; probably would have given up if not for having them around for support. I have some notes from that day: 1) env/cades-cnms 2) cmake/3.11.0 3) gcc/5.3.0 4) xalt/0.7.6 5) PE-gnu/1.0 6) fftw/3.3.5 7) scalapack/2.0.2 :sunglasses: boost/1.67.0 9) openmpi/3.0.0 10) openBLAS/0.2.19 export the CC, CXX, and FC variable for 5.3 add the path to the scalapack dir in the CMakeFind.inc in ../ run cmake .. in the build dir then, ccmake .. hit t, and copy the path to the scalapack dir, c to reconfig, then g to generate. make -j16 rmg-cpu

I was unable to produce the executable with these instruction as I am on a different CADES system. I will appreciate it if there is some more structured approach to resolve the issue.

elbriggs commented 5 years ago

Hi Ketan We switched to using an included scalapack rather than the system built one because it cleaned up build problems on many platforms. This case is the first one we've seen where it is not working. Could you include the link errors that you see as well as some information about the system you are on (e.g. what linux distribution it is based on).

Thanks Emil

jcklasseter commented 5 years ago

Hi,

I had similar issues compiling on CADES too. I used the below settings (In script) to build:

!/usr/bin/env bash

module load env/cades-cnms cmake/3.11.0 gcc/5.3.0 xalt/0.7.6 PE-gnu/1.0 fftw/3.3.5 scalapack/2.0.2 boost/1.67.0 openmpi/3.0.0 openBLAS/0.2.19

export CXX=/software/dev_tools/swtree/cs400_centos7.2_pe2016-08/gcc/5.3.0/centos7.2_gcc4.8.5/bin/g++ export CC=/software/dev_tools/swtree/cs400_centos7.2_pe2016-08/gcc/5.3.0/centos7.2_gcc4.8.5/bin/gcc export FC=/software/dev_tools/swtree/cs400_centos7.2_pe2016-08/gcc/5.3.0/centos7.2_gcc4.8.5/bin/gfortran

echo "BUILD LOG" > build.vars echo "CC: $CC" >> build.vars echo "CXX: $CXX" >> build.vars echo "FC: $FC" >> build.vars echo "cmake: $(which cmake)" >> build.vars

git clone --branch v4.0.0-beta.2 https://github.com/RMGDFT/rmgdft.git rmgdft4-beta2

git clone https://github.com/RMGDFT/rmgdft.git

cd rmgdft4-beta2 mkdir build cd build

cmake -Wno-dev .. 2>&1 | tee build.log && ((!${PIPESTATUS[0]})) && make rmg-cpu 2>&1 | tee -a build.log

I kept getting output telling me I had compiler errors. I eventually tracked it down to some memory allocation g++ error message, and tried reducing the number of parallel make arguments and, eventually, make with no -j option (so 1 core, 1 target at a time) actually finally produced a working executable. I originally was trying something like make -j 14 like the others. I have build logs that I can provide, but they're quite long so I excluded them from this post.

I tried several different tests by running the above script in the background and watching the system resources with top, and the cpu utilization only ever hit ~15% and the RAM usage only ever hit ~10% with the original make -j 14 command, so I think it might be a bug in the compiler (Or maybe it was compiled in 32 bit? Just speculation since the memory allocation errors popped up as top reported total node usage approaching 4 GB, but no idea honestly).

I tested the executable on the C60 example. The total energy of the reference log is: -343.838529 Ha and the total energy of my executable was -343.838529 Ha, which seems great (I did check and the second step's TOTAL ENERGY differed between the reference and local). It finished the example in about 17 seconds. I had to have this in my pbs job script to get things working:

module load env/cades-cnms cmake/3.11.0 gcc/5.3.0 xalt/0.7.6 PE-gnu/1.0 fftw/3.3.5 scalapack/2.0.2 boost/1.67.0 openmpi/3.0.0 openBLAS/0.2.19

cd $PBS_O_WORKDIR # rmg-cpu was moved from build to home mpirun -np 32 ~/rmg-cpu $PBS_O_WORKDIR/input

elbriggs commented 5 years ago

I ran into a similar issue on Summit recently where the build would fail if you used too large a value of x in make -jx but we did not have to go down to a serial build which I imagine would take a rather long time.

jcklasseter commented 5 years ago

Oh yeah, it took ~ 30/40 minutes to build it serially. I tried make -j2 before I tried serial, but anything besides serial was broken for me.

elbriggs commented 2 years ago

The scalapack configuration has been reworked in v4.3.0 and above. Default behavior is to search for a system provided scalapack library and use that if found. If not found the internal version is built. Use of the internal version can also be forced by using the option -DUSE_INTERNAL_SCALAPACK=1 when running cmake.