Open prckent opened 3 years ago
See https://github.com/QMCPACK/qmcpack/issues/3082#issuecomment-826352080. As long as the warmupsteps given in user input is large enough, population blow-up like #3082 will not happen. I think it is not a bug but calls for a scheme to automatically determine warmupsteps.
Further data on #3082 suggest the key issue in the population control. Sub-optimal warmup further amplifies the instability from weak population control.
I tend to think that we should simply strengthen the population control when the population exceeds a certain maximal deviation. However this then introduces one more item of information that we'll need to store or pass between QMC sections or put in restart files if we want perfect restartability. e.g. The increased feedback parameter that we review every block.
Also interesting to think about what would happen during reconfiguration.
I've got more data on this that I need to process and post. Post it here, or reopen #3082 w/ a title change and post the data there with all the prior similar plots?
Better to post new plots here. #3082 is exposed to multiple issues. Some of the spikes are eliminated by #3135. So it is better to spin off the population control issue and close the old one. "close" is not equal to "resolved".
If you think some plots are valuable for continuing the discussion here, you can click edit
on the old message and copy the contents here.
Describe the bug
Jaron's ( @jtkrogel ) validation work #3082 for the new batched DMC driver has unearthed problems in DMC equilibration that are also present in the older CPU code. This indicates that the problem is not the result of a recent change, although there are also new problems found in the batched driver. For some runs the DMC population is not properly controlled and the equilibration period influences the results.
One puzzle is that only a subset of DMC runs appear to exhibit this problem. Even then, I think this is serious enough that a new release will be warranted when understood and fixed.
To Reproduce
See #3082 for reproducers
Expected behavior
DMC population should be stable. Trial energy and population should vary smoothly. Changes between blocks and sections should be modest. Choice of warmupsteps (equilibration period) should have no impact on the DMC energy at long time.
System:
Any.
Additional context
( Created this additional issue so that it is not lost in the batched validation effort. Originally discovered by CUDA out of memory error. )