Treatment of random number seeds in ensemble/batched runs

prckent commented 2 years ago

The purpose of this issue is to discuss if the following behavior is to be considered as a bug or a feature, and then what to do/not do about it.

While setting up minimal tests for the ensemble / batched run functionality (#4091 , #4093), I noticed that the treatment of random number seeds is different in ensemble runs. The merged PRs currently check only for a crash and the statistical results are not yet verified.

The key difference is that in an ensemble run the seeding &/or random number use is different so that with a fixed seed and the same inputs, every run in the ensemble will do a distinct QMC run. This applies even to the first input which will give different results from when run independently. The results also depend on the size of the ensemble.

This historical choice has the consequence that none of the deterministic tests can be used to check ensemble runs, and more generally that someone using fixed seeds for reproducibility and only using ensembles for HPC throughput reasons will not get the results that they expect.

Hopefully no one has been caught out by this. Clearly the behavior needs to be documented. The question is then whether we should change the behavior and what behavior(s) would best suit different workflows.

The following illustrates the behavior with different ensemble sizes. If the seeds were treated consistently, every energy would be -10.528057.

$ grep seed det_qmc_short.in.xml 
   <random seed="71"/>
$ export OMP_NUM_THREADS=1
$ mpirun -n 1 ../build_gccnewmpi/bin/qmcpack det_qmc_short.in.xml >& out; ../qmcpack/nexus/bin/qmca -q eV det_qmc_short.s*.scalar.dat
                            LocalEnergy               Variance           ratio 
det_qmc_short  series 0  -10.528057 +/- 0.123473   0.097440 +/- 0.012555   0.0093 
$ cat multi2.txt
det_qmc_short.in.xml
det_qmc_short.in.xml

$ mpirun -n 2 ../build_gccnewmpi/bin/qmcpack multi2.txt >&out_m2; ../qmcpack/nexus/bin/qmca -q eV det_qmc_short.g*.s*.scalar.dat
                            LocalEnergy               Variance           ratio 
det_qmc_short.g000  series 0  -10.399152 +/- 0.078468   0.448807 +/- 0.202805   0.0432 
det_qmc_short.g001  series 0  -10.419865 +/- 0.195175   0.081891 +/- 0.038507   0.0079 
$ rm *.g*.s*.dat
$ mpirun -n 16 ../build_gccnewmpi/bin/qmcpack multi16.txt >&out_m16; ../qmcpack/nexus/bin/qmca -q eV det_qmc_short.g*.s*.scalar.dat
                            LocalEnergy               Variance           ratio 
det_qmc_short.g000  series 0  -10.480402 +/- 0.291461   0.309404 +/- 0.159051   0.0295 
det_qmc_short.g001  series 0  -10.731790 +/- 0.100510   0.331670 +/- 0.089688   0.0309 
det_qmc_short.g002  series 0  -10.297656 +/- 0.061974   0.083847 +/- 0.025331   0.0081 
det_qmc_short.g003  series 0  -10.465141 +/- 0.058750   0.221925 +/- 0.147555   0.0212 
det_qmc_short.g004  series 0  -10.397714 +/- 0.046317   0.113235 +/- 0.028581   0.0109 
det_qmc_short.g005  series 0  -10.968746 +/- 0.212205   0.135360 +/- 0.055179   0.0123 
det_qmc_short.g006  series 0  -10.377890 +/- 0.093169   0.079977 +/- 0.006489   0.0077 
det_qmc_short.g007  series 0  -10.592697 +/- 0.224244   0.232637 +/- 0.109560   0.0220 
det_qmc_short.g008  series 0  -10.369455 +/- 0.216723   0.129658 +/- 0.042167   0.0125 
det_qmc_short.g009  series 0  -10.285464 +/- 0.470668   0.091856 +/- 0.037453   0.0089 
det_qmc_short.g010  series 0  -10.332684 +/- 0.084356   0.067954 +/- 0.013056   0.0066 
det_qmc_short.g011  series 0  -10.473175 +/- 0.158590   0.196747 +/- 0.073484   0.0188 
det_qmc_short.g012  series 0  -10.357872 +/- 0.147707   0.151704 +/- 0.061738   0.0146 
det_qmc_short.g013  series 0  -10.594037 +/- 0.037998   0.121146 +/- 0.055377   0.0114 
det_qmc_short.g014  series 0  -10.682091 +/- 0.148414   0.121944 +/- 0.072871   0.0114 
det_qmc_short.g015  series 0  -10.829192 +/- 0.157941   0.258428 +/- 0.175715   0.0239

ye-luo commented 2 years ago

I would consider to change the current behavior. Each subjob should run as if it is running standalone.

prckent commented 2 years ago

Each subjob should run as if it is running standalone.

This was my expectation and what I think we should change the code to do as well.

jtkrogel commented 2 years ago

By default, each group should use a different seed (current operation satisfies this already, should remain true even if user provides a seed as this is the intended production use).

AFAIK, a given combination of (seed,#mpi) is deterministically reproducible. This clears the minimum bar for deterministic tests IMO (and also adds some entropy which deterministic tests are already short on).

If we want to be able to have each run produce identical results, I request a new input flag be added (identical_ensemble?) to support this since the use case is really only for testing and it would mess up production runs for people who have been providing a seed.

If we want each MPI group to be reproducible independent of ensemble size (i.e. g000 always produces the same results regardless of ensemble size), i would recommend the following: 1) have group 0 always use "seed" as provided, 2) have group 0 generate a list of random seeds for groups 1:N-1 and distribute, 3) have all groups reset their respective seed and then proceed with the run. This way two runs on M and N groups with M<=N would always produce matching results for groups 0:M-1.

ye-luo commented 2 years ago

When the seed is not give, how to initialize the seed for each group can be discussed separately. When the seed is given in the input file, it should be taken for the group instead of only the first group respecting the input.

Consider there is possibly that some group has seed input and some don't, I don't see a good reason to assume the all the groups can collectively decide about how to arrange seeds.

jtkrogel commented 2 years ago

The use cases are either twist averaging or an ensemble of similar molecules. In either production case, a different seed per process is desired in production runs (priority).

Reproducibility with a provided seed is also sometimes desired in production, and this is also covered by current functionality.

The only benefit I see from making changes is to make testing easier. This can be done without messing other things up that already work (i.e. by not adding statistical correlation in the production ensemble by default).

Making the entire user base provide distinct seeds in the "ensemble" input files (required to get reproducible and statistically correct production results) to make writing a handful of tests easier is not a good move.

Our efforts would be better spent trying to merge down to a single input file for ensemble runs rather than increasing the divergence between the current multiple input files in use.

prckent commented 2 years ago

The purpose of this issue is to discuss if the following behavior is to be considered as a bug or a feature, and then what to do/not do about it.

As far a testing and the above comments go, I think we can reasonably add some deterministic and statistical tests without any major work and without changing the C++. This would be enough to verify that the feature is nominally working. e.g. Checking the expected number of samples is obtained is easily done deterministically. And we should document the current algorithm to reduce surprises.

=> We don't have to make any changes to the algorithm now. Jaron's comments do remind me that the random number initialization in non-ensemble runs is more of a problem. If we had a better algorithm for this then concern about introduced correlations would be less. (e.g. https://numpy.org/doc/stable/reference/random/parallel.html lists multiple approaches)

QMCPACK / qmcpack

Treatment of random number seeds in ensemble/batched runs #4095