Why is the weighted average of the parameters the parameter estimate?

ThomasWarford commented 2 years ago

Are there any resources I can read to answer this question? Are there any caveats/conditions when this isn't the case? Does this depend on symmetry/approximations?

Thanks

ThomasWarford commented 2 years ago

I think reading this page may have partially answered this for me.

Anton-Le commented 2 years ago

The source for this would be most introductory books on probability theory and statistics: "Probability for Physicists" by Simon Sirca [Springer GTP] or "Applied Stochastic Processes" by Mario Lefebvre [Sprnger Universitext] would be proper references. Yo may want to have a look at the MIT course on probability for EE students

The reasoning is identical to what one gets taught in QM: $< x > = \int \psi(x)^* \hat{x} \psi(x) dx$ or, in essence, $< x > = \int \hat{x} P(x) dx$. The latter states that we sum all positions weighted according to the probability of being within an infinitesimal reqion around that position. Hence only those positions with a significant value of $P(x)$ (with high probability) contribute a non-vanishing part to the final average. In our case $P(x)$ is discrete and given for each particle by $P(q_i, p_i) = e^{-H(q_i,p_i)/k_B T} / Z$ and we're computing $< q > = \sum_i q_i P(q_i)$ Note that in the last part I have deliberately omitted the momentum part. Generally $P(q_i) = \int P(q_i,p) dp$, but since the joint distribution is discrete and the momenta are distinct for all paricles the integration yields only one non-vanishing contribution.

ThomasWarford commented 2 years ago

In my mind since $q_i$ is already generated according to our target distribution, a simple mean should give us the expectation value of $q$. Following this line of thinking the extra $e^{-H/k_bT}$ factor would only be needed if our samples were uniform.

ThomasWarford commented 2 years ago

Unrelated: I wonder whether convergence is slowed by correlation between points after 1 HMC leapfrog - perhaps performing multiple steps will reduce this.

ThomasWarford commented 2 years ago

It seems like sometimes we converge towards a value slightly below 0.75. I wonder if this could be because the mode is different to the mean. To clarify I don't know if this is the case, the mode may be the same as the mean for all I know.

Anton-Le / PhysicsBasedBayesianInference

Why is the weighted average of the parameters the parameter estimate? #128