Closed ketancmaheshwari closed 2 years ago
We have a few issues on CADES. You can contact with David Lingerfelt (lingerfeltdb@ornl.gov) who is using our code on CADES and he may help you in building the code. I believe the problem is due to the linked libraries.
I received the following from David: The RMG devel team was visiting at the time; probably would have given up if not for having them around for support. I have some notes from that day: 1) env/cades-cnms 2) cmake/3.11.0 3) gcc/5.3.0 4) xalt/0.7.6 5) PE-gnu/1.0 6) fftw/3.3.5 7) scalapack/2.0.2 :sunglasses: boost/1.67.0 9) openmpi/3.0.0 10) openBLAS/0.2.19 export the CC, CXX, and FC variable for 5.3 add the path to the scalapack dir in the CMakeFind.inc in ../ run cmake .. in the build dir then, ccmake .. hit t, and copy the path to the scalapack dir, c to reconfig, then g to generate. make -j16 rmg-cpu
I was unable to produce the executable with these instruction as I am on a different CADES system. I will appreciate it if there is some more structured approach to resolve the issue.
Hi Ketan We switched to using an included scalapack rather than the system built one because it cleaned up build problems on many platforms. This case is the first one we've seen where it is not working. Could you include the link errors that you see as well as some information about the system you are on (e.g. what linux distribution it is based on).
Thanks Emil
Hi,
I had similar issues compiling on CADES too. I used the below settings (In script) to build:
!/usr/bin/env bash
module load env/cades-cnms cmake/3.11.0 gcc/5.3.0 xalt/0.7.6 PE-gnu/1.0 fftw/3.3.5 scalapack/2.0.2 boost/1.67.0 openmpi/3.0.0 openBLAS/0.2.19
export CXX=/software/dev_tools/swtree/cs400_centos7.2_pe2016-08/gcc/5.3.0/centos7.2_gcc4.8.5/bin/g++ export CC=/software/dev_tools/swtree/cs400_centos7.2_pe2016-08/gcc/5.3.0/centos7.2_gcc4.8.5/bin/gcc export FC=/software/dev_tools/swtree/cs400_centos7.2_pe2016-08/gcc/5.3.0/centos7.2_gcc4.8.5/bin/gfortran
echo "BUILD LOG" > build.vars echo "CC: $CC" >> build.vars echo "CXX: $CXX" >> build.vars echo "FC: $FC" >> build.vars echo "cmake: $(which cmake)" >> build.vars
git clone --branch v4.0.0-beta.2 https://github.com/RMGDFT/rmgdft.git rmgdft4-beta2
git clone https://github.com/RMGDFT/rmgdft.git
cd rmgdft4-beta2 mkdir build cd build
cmake -Wno-dev .. 2>&1 | tee build.log && ((!${PIPESTATUS[0]})) && make rmg-cpu 2>&1 | tee -a build.log
I kept getting output telling me I had compiler errors.
I eventually tracked it down to some memory allocation g++ error message, and tried reducing the number of parallel make arguments and, eventually, make with no -j option (so 1 core, 1 target at a time) actually finally produced a working executable. I originally was trying something like make -j 14
like the others. I have build logs that I can provide, but they're quite long so I excluded them from this post.
I tried several different tests by running the above script in the background and watching the system resources with top, and the cpu utilization only ever hit ~15% and the RAM usage only ever hit ~10% with the original make -j 14 command, so I think it might be a bug in the compiler (Or maybe it was compiled in 32 bit? Just speculation since the memory allocation errors popped up as top reported total node usage approaching 4 GB, but no idea honestly).
I tested the executable on the C60 example. The total energy of the reference log is: -343.838529 Ha and the total energy of my executable was -343.838529 Ha, which seems great (I did check and the second step's TOTAL ENERGY differed between the reference and local). It finished the example in about 17 seconds. I had to have this in my pbs job script to get things working:
module load env/cades-cnms cmake/3.11.0 gcc/5.3.0 xalt/0.7.6 PE-gnu/1.0 fftw/3.3.5 scalapack/2.0.2 boost/1.67.0 openmpi/3.0.0 openBLAS/0.2.19
cd $PBS_O_WORKDIR # rmg-cpu was moved from build to home mpirun -np 32 ~/rmg-cpu $PBS_O_WORKDIR/input
I ran into a similar issue on Summit recently where the build would fail if you used too large a value of x in make -jx but we did not have to go down to a serial build which I imagine would take a rather long time.
Oh yeah, it took ~ 30/40 minutes to build it serially. I tried make -j2 before I tried serial, but anything besides serial was broken for me.
The scalapack configuration has been reworked in v4.3.0 and above. Default behavior is to search for a system provided scalapack library and use that if found. If not found the internal version is built. Use of the internal version can also be forced by using the option -DUSE_INTERNAL_SCALAPACK=1 when running cmake.
I am trying to build the latest beta release here at CADES ORNL but getting undefined symbol errors when make tries the final linking of rmg-cpu that all seems to be coming from bundled scalapack. It seems like scalapack is being built with a different parameter set than the rest of the code.
I am looking for a way to pass the right compilation parameters to scalapack OR avoid building the scalapack and provide the system built scalapack but could not find a suitable option on cmake variables.