QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
292 stars 137 forks source link

Deterministic test for Jastrow(backflow) optimization is failing with a several number of coefficients. #3189

Open Hyeondeok-Shin opened 3 years ago

Hyeondeok-Shin commented 3 years ago

Describe the bug

I've been working with deterministic test for jastrow(backflow) optimization and found that deterministic test is failing with a several number of coefficients in different machine.

Test was done in workstation, theta, and summit (CPU) using develop version with last commit (f46901f01736950c75203f37ec701aa44aa78841) on May/19/2021. Only J1 is used for the test.

Optimized J1 is identical up to 3 parameters. (det_qmc_short_opt_3.in.xml)

Workstation : 1.021369511 0.916486567 0.8654997938 Theta : 1.021369511 0.916486567 0.8654997938 Summit : 1.021369511 0.916486567 0.8654997938

But if number of coefficient is increased to 5, they are giving all different values. (det_qmc_short_opt_5.in.xml)

Workstation : 2.95646542 -2.478173825 -4.360350896 -3.72100857 -2.081630639 Theta : -1.215235277 -2.132996212 -1.642853768 -1.823485461 -0.9572376071 Summit : Fatal Error. Aborting at Invalid Matrix Diagonalization Function!

With the choice of random seeds, it is giving different errors that could be related with this issue.

  1. Fatal Error. Aborting at Invalid Matrix Diagonalization Function! (summit : det_qmc_short_opt_5.in.xml)
  2. Fatal Error. Aborting at QMCHamiltonian::evaluate component NonLocalECP returns NaN (theta : det_qmc_short_opt_5_err.in.xml)
  3. ERROR Safeguard failure: checkConfigurations variance out of [0.5, 2.0] * reference! Please report this bug. (theta : det_qmc_short_opt_5_err_2.in.xml)

To Reproduce Run with single mpi/thread

archive.zip

Expected behavior Pass deterministic test

System: Workstation(intel Xeon), theta, summit

Additional context Add any other context about the problem here.

prckent commented 3 years ago

Any chance we are lucky and it works with one mpi tasks and one thread? i.e. parallel logic error vs optimization plumbing error? If the latter, perhaps backflow only but no jastrow is good?