QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
297 stars 139 forks source link

Hybrid method kinetic energy crash #2980

Open jtkrogel opened 3 years ago

jtkrogel commented 3 years ago

Describe the bug A DMC population explosion was reported to me by Dan Staros. The typical culprit for these is a "stuck walker" (constant rejection in an area of low potential energy within the core region of a non-local pseudopotential). In this case, the potential energy remains nearly constant. Instead, the kinetic energy of a single walker suddenly falls from about 1000 Ha to 200 Ha, resulting in a population explosion in a small part of the phase space.

This points to issues in the calculation of the kinetic energy of the trial wavefunction. Since this has not been seen before to my knowledge for pure B-spline based Slater-Jastrow wavefunctions, the most likely culprit is the lesser used hybrid atomic orbital code that was used in this case.

To Reproduce It is not yet known if the behavior is easily reproduced. If it is, I expect the production of a checkpoint at the end of the first DMC series that samples this part of the phase space could be used to isolate a particular walker configration in this region and then locally debug the issue on a workstation by comparing the trial wavefunction laplacian at this coordinate with and without employing the hybrid method.

Full dataset (all inputs/outputs) from the run in question are available on Summit at:

/gpfs/alpine/proj-shared/mat151/tiihonej/dmc_CrI3_fs_72

Expected behavior The local kinetic energy value should remain in the vicinity of 1000 Ha rather than 200 Ha.

System: Cori at NERSC.

jtkrogel commented 3 years ago

Files from the run. The crash occurs for twist 008. varianceExpl.zip

jtkrogel commented 3 years ago

Behavior of the kinetic energy:

kinetic_energy_drop

jtkrogel commented 3 years ago

Location of the trial energy drop in the first DMC series (zoomed in, see the downward slanting line): trial_energy_drop_zoom

prckent commented 3 years ago

Has an underconverged basis set been considered? e.g. the problem is reproducible with doubled grids, or the same hybrid parameters have been used successfully with a similar electronic structure?

jtkrogel commented 3 years ago

Dan is trying a run with a larger meshfactor. The same hybrid rep parameters are used for each of the nine twists. The variance/energy ratio indicates respectable quality generally for the all twists, so I don't think the problem is broadly based (i.e. not due to the general mesh, but more likely to be a small region of phase space):

>qmca -q ev *s000*scalar*
                            LocalEnergy               Variance           ratio 
dmc.g000  series 0  -2176.465364 +/- 0.025357   55.713636 +/- 1.058126   0.0256 
dmc.g001  series 0  -2176.454010 +/- 0.019833   54.858046 +/- 0.310613   0.0252 
dmc.g002  series 0  -2176.429570 +/- 0.015292   54.685930 +/- 0.332021   0.0251 
dmc.g003  series 0  -2176.446316 +/- 0.015674   54.932212 +/- 0.490942   0.0252 
dmc.g004  series 0  -2176.445535 +/- 0.021688   53.899520 +/- 0.279747   0.0248 
dmc.g005  series 0  -2176.419633 +/- 0.019214   54.384172 +/- 0.246866   0.0250 
dmc.g006  series 0  -2176.464741 +/- 0.020904   55.483628 +/- 0.592998   0.0255 
dmc.g007  series 0  -2176.434707 +/- 0.017068   54.000493 +/- 0.287997   0.0248 
dmc.g008  series 0  -2176.393043 +/- 0.016223   54.034666 +/- 0.341088   0.0248 
jtkrogel commented 3 years ago

Thankfully, we have the coordinates of walkers that have sampled this portion of the phase space available. With these, the bug should be able to be isolated rather quickly on a workstation.

ye-luo commented 3 years ago

I think the smoothing scheme used between atomic and interstitial region is not robust. Has the cutoff_radius been tuned?

jtkrogel commented 3 years ago

Please note that the full fileset for the run (including wavefunction and checkpoint files) are now available at OLCF, see issue header.

What process do you mean when you say "tuned"?