ComputationalCryoEM / ASPIRE-Python

Algorithms for Single Particle Reconstruction
http://spr.math.princeton.edu
GNU General Public License v3.0
49 stars 21 forks source link

Integrate CTF estimate feature #379

Closed garrettwrong closed 3 years ago

garrettwrong commented 3 years ago

Refactor and update ctf branch so it can be integrated into develop branch.

garrettwrong commented 3 years ago

Just adding some notes about the main work I did today. I have an updated and rebased brach for this feature that conforms to our CI system. I am currently numerically validating. This is taking a little longer to validate than I estimated because the feature results are not deterministic. I have locally implemented two changes for determinism and am working through the program beginning to end for an example run to validate.

First, I am force loading a signal variable (aka thon_rings) from a numpy save file for both the legacy and updated branch. I have not identified why that is not deterministic in the original code yet. With updates, It is still nondeterministic in the last digit in the new code, but much less variable than the old branch. Twiddling numpy/scipy versions etc didn't seem to matter much. Strange.

Second, I have identified another root cause as the method applied in linprog. Interior point method is (very much) not deterministic. I added an optional variable that allows me to switch to simplex instead. This is slower but yielding consistent results that I can use to validate the rest of the program.

garrettwrong commented 3 years ago

Noting the simplex method takes approx 50% longer on my test example. I will look into the initial point.

garrettwrong commented 3 years ago

Looks like x0 initial guess point is only applicable to revised-simplex method, not interior-point. Going to rm the repro flag for now. Left note above the call in the code if anyone needs to help debug it later. Will break out the dtype from repro so users may elect what precision they'd like from the command line.

garrettwrong commented 3 years ago

@janden

Noting some of the non-determinism appears to come from pyfftw. FFTW by default is not necessarily using deterministic algorithm. http://www.fftw.org/faq/section3.html#nondeterministic

Setting pyfftw.config.PLANNER_EFFORT = 'FFTW_ESTIMATE' is mentioned in their docs as a quick way to get a deterministic result. I have confirmed that settles the original code numerically (nice!, wish I figured that out first :D ).

I will continue along those lines tomorrow in the background. (First verify the fftw activity/results in the new code, then address conversion to ASPIRE's built in configurable fft).

garrettwrong commented 3 years ago

I ran the current state of the ctf estimation code overnight so I could look at the variations. The attached charts capture what I am seeing. This is better than it was, but still not great imo. I tried to dress them up just enough to be self explanatory.

I am hoping to tighten this up today/tomorrow so that the unit test doesn't have to be fuzzy. Figure_1 Figure_2

garrettwrong commented 3 years ago

Today I found several uninitialized (np.empty) arrays were adding random noise to the algorithm. I have changed them to np.zero, and remaining noise looks mostly like typical eigen solver and fft noise. This might mean both code have an incorrect slice, or simply that it was expecting zeros... In doubles we now maintain a small run to run error in the signal/thon_ring arrays (E-14 or so, happy with that).

However, the aggregate defocus values are still not as robust as I hoped. I will re run the experiment loop I ran last night and see where we're at. Maybe I can find a good centroid for the unit test. I could settle for a single cluster....

janden commented 3 years ago

Setting pyfftw.config.PLANNER_EFFORT = 'FFTW_ESTIMATE' is mentioned in their docs as a quick way to get a deterministic result. I have confirmed that settles the original code numerically (nice!, wish I figured that out first :D ).

Makes sense. We should probably be running FFTW_ESTIMATE by default anyhow since we're not very good about keeping plans around (and FFTW_MEASURE and FFTW_PATIENT can be quite slow, IIRC).

This might mean both code have an incorrect slice, or simply that it was expecting zeros...

I'm not completely following here. Can you point me to the spot? I can tell you what it's supposed to do.

However, the aggregate defocus values are still not as robust as I hoped.

Ok. It's not too crazy that the defocus values would be a little more unstable since they're found by solving the optimization problem.

garrettwrong commented 3 years ago

Even with the signals now pretty stable run to run, the defocus values would probably need E-2 to get through out test suite reliably. It is possible there are more improvements to find, but I am getting past the amount of time I budgeted to necromance this branch. I think we might consider passing this back to Ayelet soon. Maybe she can comment on her expectations regarding the accuracy etc. (Like if the defocus results are good, maybe I am just too picky).

garrettwrong commented 3 years ago

the E-2 number comes from the results of running in a loop. Last night over 2500 ish runs generated the following. Setting the unit test to the midpoint with a radius of 5E-2 should probably work okay in practice.

defocusU         mu: 1.1142528440e+03        var: 1.0426614441e-04 min: 1.1141977248e+03 max: 1.1142750273e+03 med: 1.1142543796e+03 mid:1.1142363760e+03 radius:3.8651220000e-02
defocusV         mu: 1.0921009265e+03        var: 6.8716408008e-05 min: 1.0920768102e+03 max: 1.0921198301e+03 med: 1.0921012590e+03 mid:1.0920983202e+03 radius:2.1509980000e-02
defocusAngle         mu: -8.3727610604e-03       var: 4.2081506088e-11 min: -8.3762500000e-03 max: -8.3281100000e-03 med: -8.3739700000e-03 mid:-8.3521800000e-03 radius:2.4070000000e-05

Characteristically it is only slightly tighter clustering than previously.

garrettwrong commented 3 years ago

Sparing a few issues (#422, #417, #416 ,#412) the bulk of this feature has been integrated in #380 .