iterative ref: resources for expectation maximization iterative refinement

geoffwoollard commented 2 years ago

Nelson, P. C. (2019). Chapter 12 : Single Particle Reconstruction in Cryo-electron Microscopy. In Physical Models of Living Systems (pp. 305–325).
- https://repository.upenn.edu/cgi/viewcontent.cgi?article=1665&context=physics_papers
"2. Approach" in Scheres, S. H. W. (2012). RELION: Implementation of a Bayesian approach to cryo-EM structure determination. Journal of Structural Biology, 180(3), 519–530. http://doi.org/10.1016/j.jsb.2012.09.006
Fred Sigworth
- https://cryoemprinciples.yale.edu/sites/default/files/files/Chapter6.pdf
- https://cryoemprinciples.yale.edu/video-lectures
- NCCAT SPA short course 2020, Lecture 4: Algorithms and foundational math Part I & 2, Fred Sigworth (1:18:10 - 1:28:46, 10 min) https://www.youtube.com/watch?v=tzv5c5K7MEk&t=4690s

geoffwoollard commented 2 years ago

capstone students can read these papers and ask me about them

thisFreya commented 2 years ago

Hi @geoffwoollard, I've assembled a few questions below. We can schedule a call to handle these or if you can answer them here that's fine too.

1. a. The one-dimensional section of this paper provided particular emphasis on accounting for shifts in the data (sometimes using cross-correlation). Is there an analogue for handling rotations this way in 2-D or are rotations handled differently? b. Equation (8.27) makes use of the fact that the true image A has uniformly distributed rotational angles due to the algorithm not accounting for prior information about A. How would prior information regarding the true image be integrated into a model like this?

a. Equation (1) has the contrast transfer function used as a weighting algorithm for modelled particles. Is there an intuitive way of thinking about the CTF in this context? b. What if noise isn’t gaussian? The law of large numbers states that sufficient noise, if it has an expectation, should tend towards a normal distribution, but is this a reasonable expectation for CryoEM? c. RELION explored validations on each individual step in its algorithm, with distinct focus placed on any new developments. How do you expect reconstructSPI to look in terms of making the algorithms accessible at multiple interim points for validation purposes? d. RELION provides a well-motivated, well-documented implementation. How close do you expect the initial reconstructSPI algorithm will be to RELION? What are some (algorithmic) features that you want to see that RELION does not cover?
a. Do you know more about the Wiener filter?

geoffwoollard commented 2 years ago

a. I wouldn't focus on Eqs 8.2 or 8.3 in Nelson. It's a minor footnote. Shifts can be efficiently searched in Fourier space with a multiplicative factor (phase factor). They can also be done subpixel, meaning 0.5 pixels (or 0.05... any real number). See https://www.dsprelated.com/freebooks/mdft/Shift_Theorem.html b. Think about a prior of A in terms of the data generating process. The forward model of image formation. The physical model. When you simulate data, you have to choose a prior. If we started from an image, and rotated it uniformly, it would be a uniform prior. If we only rotated it uniformly in in a window of +/- 5 degrees, there would be a top hat distribution in that interval.

So think, are the images rotated randomly? Or is there any bias? Or think about having the full joint distribution Prob(all latent variables) = Prob(rotation, shift, ...). Then integrating out everything else except rotation. Would you expect to have a uniform distribution, or something else. Sum_{other variables} Prob(rotation, other variables) = Prob(rotation). In actual datasets with rotation in 3D, the prior is often NOT uniform, because the sample is preferentially interacting with a supporting substrate. In this case there would be some soft of distribution peaked around where it is sticking.

a. Yes, the CTF a point spread function (psf) in Fourier space. The psf is a convolution in real space (turns points into whatever the psf is). The FT[psf]=CTF and by the convolution theorem is a element wise multiplication. So that's why the CTF is multiplying the "clean projection", which in Eq 1 is the projection operator acting on the 3D volume. b. If the noise is not Gaussian, you need another likelihood model, and Eq 7 would take another form. I have worked some things out for Poisson noise. Gaussian noise is convenient because additive white Gaussian noise in real space is still additive white Gaussian in Fourier space (very non intuitive). I have worked out the proof and needed to learn a bit about complex random variables to follow it. I find it quite elegant.

Implement white Gaussian noise first, and then a future enhancement will be coloured Gaussian noise, where the standard deviation changes in Frequency bands, which is easy to implement in a vectorized way with spherical shells in 3D and circular shells in 2D.

c. Have a look at the initial skeleton of code I pushed. You can ask me some specific questions about what I already wrote. And we can talk about good tests for each function.

d. Extremely close to the algorithm in the paper. Do not worry about enhancements as I think there will be no time for them before the end of your capstone. If you're interested in working on this in the summer, we can talk more then. There will be lots to do.

a. See http://www.bio.brandeis.edu/~shaikh/lab/ctf.htm I've played around with this on paper and in code. You can see what happens if the signal is low (beause it was multiplied by the CTF where the CTF was near zero) and then you add noise, and then divide out by the low CTF number. The noise blows up. You want to prevent this by dampening things by using the right epsilon in the denominator: ctf_convoluted_signal_plus_noise / ctf --> ctf_convoluted_signal_plus_noise * ctf / (ctf^2 + epsilon). Try some specific numbers to "see it".

geoffwoollard commented 2 years ago

Some resources are in docstring, so closing

https://github.com/compSPI/reconstructSPI/blob/89fe20e01247d5d4853c0cb066e458f229be7fac/reconstructSPI/iterative_refinement/expectation_maximization.py#L36

compSPI / reconstructSPI

iterative ref: resources for expectation maximization iterative refinement #6