QMCPACK / qmcpack

Main repository for QMCPACK, an open-source production level many-body ab initio Quantum Monte Carlo code for computing the electronic structure of atoms, molecules, and solids with full performance portable GPU support
http://www.qmcpack.org
Other
297 stars 139 forks source link

qmca doesn't do weighted average #3857

Open ye-luo opened 2 years ago

ye-luo commented 2 years ago
$ cat qmc_short.s000.scalar.dat
#   index    LocalEnergy         LocalEnergy_sq      LocalPotential      Kinetic             ElecElec            IonIon              LocalECP            NonLocalECP         BlockWeight         BlockCPU            AcceptRatio         
         0   -2.1497437216e+01    4.6296784906e+02   -4.1747444816e+01    2.0250007600e+01   -5.8015375457e+00   -2.5551326901e+01   -1.1200609672e+01    8.0602930218e-01    3.2000000000e+01    1.6943663359e-03    7.9785156250e-01
         1   -2.1533011570e+01    4.6433605720e+02   -4.0404599724e+01    1.8871588154e+01   -5.9175158025e+00   -2.5551326901e+01   -1.0918284006e+01    1.9825269859e+00    1.2000000000e+01    1.7340630293e-03    7.7929687500e-01
$ qmca -q ev -e 0 qmc_short.s000.scalar.dat
                            LocalEnergy               Variance           ratio 
qmc_short  series 0  -21.515224 +/- 0.012577   0.746756 +/- 0.057478   0.0347 

qmca only does (-2.1497437216-2.1533011570)/2 = -2.1515224393 but to me a weighted average is more appropriate.

I expect (-2.1497437216 * 32 -2.1533011570 * 12)/(32+12) = -2.1507139312

jtkrogel commented 2 years ago

Propose a weighted autocorrelation algorithm and we're in business.

ye-luo commented 2 years ago

Why auto-correlation is a prerequisite? I should be able to access the weighted average and standard deviation of the statistics and use the auto-correlation option at my will.

jtkrogel commented 2 years ago

I would like to move to unbiased weighted statistics in a consistent way throughout qmca

ye-luo commented 2 years ago

Isn't it even worse now that everything is not weighted (biased) out of qmca?

jtkrogel commented 2 years ago

Not really. The practical use cases (VMC, production DMC) have no or very little weight fluctuation and very little bias (like 1/100 of the statistical errorbar) when analyzed in practice.

There is large bias in the low walker, high weight fluctuation limit you are looking at, but this is not the practical case.

Therefore, I think we can afford to wait until we have a consistent solution for handling weights across all statistics we compute.

I agree this needs to be done, but there is some work we need to do first to find or derive unbiased equations for all quantities with weighted time series.

ye-luo commented 2 years ago

Another point. Technically, we will have reweighed VMC, so it is not fully immune.

prckent commented 2 years ago

We don't have working one now, but a fixed population/stochastic reconfiguration like DMC algorithm will be more likely to need the weighting. We'll also have to get used to plotting weight fluctuations and not population fluctuations.

jtkrogel commented 2 years ago

Yes, we definitely need to account for weights. We just don't have the formulae to account for them yet, therefore this is the first task.

ye-luo commented 2 years ago

FYI. scalar.data only contain BlockWeight without population. The population is reflected as weight in the fluctuating population scheme. qmca just ignores that column.

jtkrogel commented 2 years ago

Formulae for weighted mean and equal time covariance:

https://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_covariance

See "Weighted sample covariance" and "Frequency weights".

jtkrogel commented 2 years ago

What is missing is autocorrelation time estimation with weighted time series. Probably a biased formula for this can be derived in the discrete (weighted delta function) limit from formulae for autocorrelation time from continuous distributions.

Deriving the analogous Bessel's correction (https://en.wikipedia.org/wiki/Bessel%27s_correction) for the autocorrelation time bias will take more thought.

ye-luo commented 2 years ago

I tried to understand how the auto-correlation is handled. In our manual, I only found how to use qmca but I cannot find how numbers are handled inside qmca. How the auto-correlation affects the error bar being reported. How is the equilibrium is determined. The manual may not be the best place to record such info, qmca source or help can be better places but we do need to document those options defaulted as auto by qmca.

jtkrogel commented 2 years ago

See "Computing autocorrelation times" here: https://dfm.io/posts/autocorr/

This is basically what we do in qmca and elsewhere.

jtkrogel commented 2 years ago

See also Slide 8 in https://github.com/QMCPACK/qmc_workshop_2021/blob/master/week3_stats_and_nexus/week3_stats_nexus_vfinal.pdf

The truncation (expressed in the variable M) is done when the sample autocorrelation function drops below zero.