anderkve / FYS3150

https://anderkve.github.io/FYS3150
26 stars 14 forks source link

Understanding MCMC #79

Closed AntonBrekke closed 1 year ago

AntonBrekke commented 1 year ago

Hi!

I have some issues understanding MCMC. Especially the code-example from the lecture. As far as I've understood:

  1. We assume we know some probability distribution p(x) that is proportional to the distribution we want to find, P(x).
  2. We do a random walk, and propose a new position x_proposal based on x_current. Since p(x) and P(x) is proportional, we get the acceptance min(1, p(x_proposal) / x_current). We choose a quantity R between 0 and 1 from the uniform distribution, and check if R < p(x_proposal) / p(x_current) for acceptance.
  3. If we accept, x_current = x_proposal.
  4. We have some derived quantity f(x) (I don't know what this represent), and evaluate it as f(x_current). This is some sort of distribution sampled from our p(x), but I don't know how to interpret this result.

When reading about this on the internet the terms "prior", "likehood", and "posterior" shows up (from Bayes Theorem), and I'm not quite sure how to map it to the lectures. As far as I understand, the distribution of "posterior" is unknown and is what we want to sample (in our case x_current I think). But I do not understand how this maps to the code example (MCMC_example_demo.py) from the lecture.

However, in project 4 I understand this as we want to sample energies from the Boltzmann dist. P(E) = 1/Z exp(-BE). Since Z is hard to find, we assume (or know) that the dist. is proportional to this, i.e. the Boltzmann factor, and use MCMC on this (since the factor 1/Z cancels in our acceptance). This way we don't need to calculate the partition function Z, and avoid our biggest problem.

What I'm still pondering on: How do I map this to the example code? Why do we not have any "likehood" in our cases (or if we do and I failed to register them; where are they?) ? Do I understand MCMC or am I stupid? Do I misunderstand any of the key concepts?

I appreciate all answers. Thanks!

anderkve commented 1 year ago

Hi @AntonBrekke!

I'll try to clarify below, but if it's still unclear, feel free to come to the lab session tomorrow (12--14) and ask me there. It might be easier to explain the details in person. :)

When reading about this on the internet the terms "prior", "likehood", and "posterior" shows up (from Bayes Theorem), and I'm not quite sure how to map it to the lectures. As far as I understand, the distribution of "posterior" is unknown and is what we want to sample (in our case x_current I think). But I do not understand how this maps to the code example (MCMC_example_demo.py) from the lecture.

As discussed in the lectures, MCMC is fundamentally just a approach one can use to collect a set of samples from some (often high-dimensional) probability distribution. (And we don't need to know the normalization constant of this distribution.) It is not limited to applications within Bayesian statistics, so I would not recommend you use Bayesian statistics as the entry point to reading about it. But MCMC is very much used in Bayesian statistics, because the goal of a Bayesian analysis is often to determine an approximation to the so-called posterior probability distribution, which in most cases cannot be determined analytically (but we know it's proportional to likelihood * prior.) One powerful way to obtain that approximation numerically is to use MCMC to collect a bunch of samples from the posterior distribution and make histograms.

However, in project 4 I understand this as we want to sample energies from the Boltzmann dist. P(E) = 1/Z exp(-BE).

Note that the Boltzmann dist. is not a probability dist. for the energy E -- it's a probability dist. for the system microstate s: P(s) = 1/Z exp(-BE(s)). It's just that the s-dependence of this probability dist. is only through the energy expression E(s). I discussed this a bit in this lecture: https://www-int.uio.no/studier/emner/matnat/fys/FYS3150/h22/forelesningsvideoer/bl24-03-storefy-cynap-fys_20221021_081526.mp4?vrtx=view-as-webpage , corresponding to these hand-written notes: https://github.com/anderkve/FYS3150/blob/master/lecture_notes/Lec_16__Oct_21__The_Ising_model.pdf

In project 4 we have an N-dimensional space of discrete states s, and an assumed probability distribution P(s) (the Boltzmann dist.) over this space. We use MCMC to generate sample states s_i from P(s), and for each such sample s_i we compute other quantities of interest, e.g. energy E_i = E(s_i) and magnetization M_i = M(s_i). So e.g. E_i are our energy samples, but they are not samples that we have found by assuming some P(E) and generate E samples from that distribution directly. Rather they are samples that we have generated by sampling s_i from our assumed P(s) and computing the corresponding E_i = E(s_i). By e.g. histogramming our E_i samples we can then see approximately what P(E) looks like (at a given temperature T, and assuming thermal equilibrium, which is the assumption behind using the Boltzmann dist. for P(s)).

This relation between what we sample (s) and the derived quantities we compute from our samples (e.g. E and M) is discussed a bit in this lecture: https://www-int.uio.no/studier/emner/matnat/fys/FYS3150/h22/forelesningsvideoer/cynap_fys3150_20221021_091544.mp4?vrtx=view-as-webpage

The analogy between Project 4 and the 1D MCMC demo code is this:

s <--> x E(s) <--> f(x)

In the MCMC demo I assumed some prob. density p(x) (black line in upper plot) and used MCMC to generate x samples from that prob. density (orange histogram in upper plot). For each x sample x_i I computed some derived quantity f_i = f(x_i) (as mentioned in that lecture, I just chose some arbitrary function form f(x)), and made a histogram of these resulting f_i samples (blue histogram in the lower plot). So that blue histogram would be my numerical approximation to the unknown prob. dens. p(f), just like our energy samples in Project 4 can be histogrammed to see an approximate prob. dens. p(E).

Hope this helps! :)