QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
307 stars 139 forks source link

DMC equilibration algorithms/implementation are buggy and need to be updated #3133

Open prckent opened 3 years ago

prckent commented 3 years ago

Describe the bug

Jaron's ( @jtkrogel ) validation work #3082 for the new batched DMC driver has unearthed problems in DMC equilibration that are also present in the older CPU code. This indicates that the problem is not the result of a recent change, although there are also new problems found in the batched driver. For some runs the DMC population is not properly controlled and the equilibration period influences the results.

One puzzle is that only a subset of DMC runs appear to exhibit this problem. Even then, I think this is serious enough that a new release will be warranted when understood and fixed.

To Reproduce

See #3082 for reproducers

Expected behavior

DMC population should be stable. Trial energy and population should vary smoothly. Changes between blocks and sections should be modest. Choice of warmupsteps (equilibration period) should have no impact on the DMC energy at long time.

System:

Any.

Additional context

( Created this additional issue so that it is not lost in the batched validation effort. Originally discovered by CUDA out of memory error. )

ye-luo commented 3 years ago

See https://github.com/QMCPACK/qmcpack/issues/3082#issuecomment-826352080. As long as the warmupsteps given in user input is large enough, population blow-up like #3082 will not happen. I think it is not a bug but calls for a scheme to automatically determine warmupsteps.

ye-luo commented 2 years ago

Further data on #3082 suggest the key issue in the population control. Sub-optimal warmup further amplifies the instability from weak population control.

prckent commented 2 years ago

I tend to think that we should simply strengthen the population control when the population exceeds a certain maximal deviation. However this then introduces one more item of information that we'll need to store or pass between QMC sections or put in restart files if we want perfect restartability. e.g. The increased feedback parameter that we review every block.

Also interesting to think about what would happen during reconfiguration.

jtkrogel commented 2 years ago

I've got more data on this that I need to process and post. Post it here, or reopen #3082 w/ a title change and post the data there with all the prior similar plots?

ye-luo commented 2 years ago

Better to post new plots here. #3082 is exposed to multiple issues. Some of the spikes are eliminated by #3135. So it is better to spin off the population control issue and close the old one. "close" is not equal to "resolved".

If you think some plots are valuable for continuing the discussion here, you can click edit on the old message and copy the contents here.