LLNL / qball

Qball (also known as qb@ll) is a first-principles molecular dynamics code that is used to compute the electronic structure of atoms, molecules, solids, and liquids within the Density Functional Theory (DFT) formalism. It is a fork of the Qbox code by Francois Gygi.
GNU General Public License v3.0
45 stars 22 forks source link

Ground state calculations failing with multithreading on cab #19

Open xorJane opened 7 years ago

xorJane commented 7 years ago

Calculated energies seem to explode (<etotal> grows exponentially) whenever the number of threads per task is greater than 1, at least on cab. Setting OMP_NUM_THREADS to 1 via export OMP_NUM_THREADS=1 avoids the issue.

The input script we were using was:

set ecut 45

set cell 15.0 0.0 0.0 0.0 15.0 0.0 0.0 0.0 15.0 species H H_HSCV_PBE-1.0.xml atom H1 H 1.508729 0.441434 0.259102

set force_complex_wf ON set xc PBE set ecutprec 8.0

randomize_wf run 0 500

quit

alfC commented 7 years ago

Is the total charge (normalization) also exploding? This reminds me of the unresolved (?) bug in which Norm is not preserved in a ground state calculation.

On Jun 29, 2017 00:57, "Jane Herriman" notifications@github.com wrote:

Calculated energies seem to explode ( grows exponentially) whenever the number of threads per task is greater than 1, at least on cab. Setting OMP_NUM_THREADS to 1 via export OMP_NUM_THREADS=1 avoids the issue.

The input script we were using was:

set ecut 45

set cell 15.0 0.0 0.0 0.0 15.0 0.0 0.0 0.0 15.0 species H H_HSCV_PBE-1.0.xml atom H1 H 1.508729 0.441434 0.259102

set force_complex_wf ON set xc PBE set ecutprec 8.0

randomize_wf run 0 500

quit

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/LLNL/qball/issues/19, or mute the thread https://github.com/notifications/unsubscribe-auth/ACMfYKJfWq2AdClTsXaLl2VOxG2kd3f9ks5sItpmgaJpZM4OIonX .

xorJane commented 7 years ago

Yes, the norm is also exploding!

alfC commented 7 years ago

The good new is that you found the fix for old bug: don't use threads. :)

On Thu, Jun 29, 2017 at 1:16 AM, Jane Herriman notifications@github.com wrote:

Yes, the norm is also exploding!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LLNL/qball/issues/19#issuecomment-311818153, or mute the thread https://github.com/notifications/unsubscribe-auth/ACMfYODIc57U5GZqjW-9uw8OwNzEkinAks5sIt7KgaJpZM4OIonX .

-- Alfredo

alfC commented 7 years ago

More seriously, I think the problem is the routine that orthonormalizes the wavefunctions, and the explosion of the energy (or eigenvalues) is a side effect.

On Thu, Jun 29, 2017 at 1:18 AM, Alfredo Correa alfredo.correa@gmail.com wrote:

The good new is that you found the fix for old bug: don't use threads. :)

On Thu, Jun 29, 2017 at 1:16 AM, Jane Herriman notifications@github.com wrote:

Yes, the norm is also exploding!

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/LLNL/qball/issues/19#issuecomment-311818153, or mute the thread https://github.com/notifications/unsubscribe-auth/ACMfYODIc57U5GZqjW-9uw8OwNzEkinAks5sIt7KgaJpZM4OIonX .

-- Alfredo

-- Alfredo

draeger1 commented 7 years ago

Are you building the code from source? If so, what are your configuration / link options? If not, which executable are you using? My guess is this is due to the wrong libraries being used at link time.

xorJane commented 7 years ago

Yep, I'm building the code from source! The configuration script I'm using on cab is

export LIBDIR=$HOME/Qball/lib/surface-libs export LIBS_BLAS="-L/usr/local/tools/mkl-10.3.1/lib -lmkl_core -lmkl_intel_thread -lmkl_intel_lp64 -lifcore" export LIBS_BLACS="/usr/local/tools/mkl-10.3.1/lib/libmkl_blacs_intelmpi_lp64.a" export LIBS_SCALAPACK=$LIBDIR/libscalapack.a

export CXX=/usr/local/bin/mpiicpc export CC=/usr/local/bin/mpiicc export CXXFLAGS=" -g -openmp -O3" export CFLAGS=" -g -openmp -O3" export LIBS=-lcurl

../../configure --prefix=$HOME --with-xerces-prefix=$HOME/Qball/

draeger1 commented 7 years ago

I'm guessing there's a mismatch between libscalapack.a and the MKL Blacs library. Try removing the LIBS_BLACS variable (or setting it to nothing) and seeing if that resolves the problem. If that doesn't work, we should probably try building ScaLAPACK from source on cab and linking to that.

xorJane commented 7 years ago

Thanks for the idea! Unfortunately commenting out the LIBS_BLACS variable didn't resolve the issue, so rebuilding ScaLAPACK might be something to try next.