Closed garrettwrong closed 3 years ago
Just adding some notes about the main work I did today. I have an updated and rebased brach for this feature that conforms to our CI system. I am currently numerically validating. This is taking a little longer to validate than I estimated because the feature results are not deterministic. I have locally implemented two changes for determinism and am working through the program beginning to end for an example run to validate.
First, I am force loading a signal variable (aka thon_rings) from a numpy save file for both the legacy and updated branch. I have not identified why that is not deterministic in the original code yet. With updates, It is still nondeterministic in the last digit in the new code, but much less variable than the old branch. Twiddling numpy/scipy versions etc didn't seem to matter much. Strange.
Second, I have identified another root cause as the method applied in linprog
. Interior point method is (very much) not deterministic. I added an optional variable that allows me to switch to simplex
instead. This is slower but yielding consistent results that I can use to validate the rest of the program.
Noting the simplex
method takes approx 50% longer on my test example. I will look into the initial point.
Looks like x0
initial guess point is only applicable to revised-simplex
method, not interior-point
. Going to rm the repro flag for now. Left note above the call in the code if anyone needs to help debug it later. Will break out the dtype
from repro
so users may elect what precision they'd like from the command line.
@janden
Noting some of the non-determinism appears to come from pyfftw. FFTW by default is not necessarily using deterministic algorithm. http://www.fftw.org/faq/section3.html#nondeterministic
Setting pyfftw.config.PLANNER_EFFORT = 'FFTW_ESTIMATE' is mentioned in their docs as a quick way to get a deterministic result. I have confirmed that settles the original code numerically (nice!, wish I figured that out first :D ).
I will continue along those lines tomorrow in the background. (First verify the fftw activity/results in the new code, then address conversion to ASPIRE's built in configurable fft).
I ran the current state of the ctf estimation code overnight so I could look at the variations. The attached charts capture what I am seeing. This is better than it was, but still not great imo. I tried to dress them up just enough to be self explanatory.
I am hoping to tighten this up today/tomorrow so that the unit test doesn't have to be fuzzy.
Today I found several uninitialized (np.empty) arrays were adding random noise to the algorithm. I have changed them to np.zero, and remaining noise looks mostly like typical eigen solver and fft noise. This might mean both code have an incorrect slice, or simply that it was expecting zeros... In doubles we now maintain a small run to run error in the signal/thon_ring arrays (E-14 or so, happy with that).
However, the aggregate defocus values are still not as robust as I hoped. I will re run the experiment loop I ran last night and see where we're at. Maybe I can find a good centroid for the unit test. I could settle for a single cluster....
Setting pyfftw.config.PLANNER_EFFORT = 'FFTW_ESTIMATE' is mentioned in their docs as a quick way to get a deterministic result. I have confirmed that settles the original code numerically (nice!, wish I figured that out first :D ).
Makes sense. We should probably be running FFTW_ESTIMATE
by default anyhow since we're not very good about keeping plans around (and FFTW_MEASURE
and FFTW_PATIENT
can be quite slow, IIRC).
This might mean both code have an incorrect slice, or simply that it was expecting zeros...
I'm not completely following here. Can you point me to the spot? I can tell you what it's supposed to do.
However, the aggregate defocus values are still not as robust as I hoped.
Ok. It's not too crazy that the defocus values would be a little more unstable since they're found by solving the optimization problem.
Even with the signals now pretty stable run to run, the defocus values would probably need E-2 to get through out test suite reliably. It is possible there are more improvements to find, but I am getting past the amount of time I budgeted to necromance this branch. I think we might consider passing this back to Ayelet soon. Maybe she can comment on her expectations regarding the accuracy etc. (Like if the defocus results are good, maybe I am just too picky).
the E-2 number comes from the results of running in a loop. Last night over 2500 ish runs generated the following. Setting the unit test to the midpoint with a radius of 5E-2 should probably work okay in practice.
defocusU mu: 1.1142528440e+03 var: 1.0426614441e-04 min: 1.1141977248e+03 max: 1.1142750273e+03 med: 1.1142543796e+03 mid:1.1142363760e+03 radius:3.8651220000e-02
defocusV mu: 1.0921009265e+03 var: 6.8716408008e-05 min: 1.0920768102e+03 max: 1.0921198301e+03 med: 1.0921012590e+03 mid:1.0920983202e+03 radius:2.1509980000e-02
defocusAngle mu: -8.3727610604e-03 var: 4.2081506088e-11 min: -8.3762500000e-03 max: -8.3281100000e-03 med: -8.3739700000e-03 mid:-8.3521800000e-03 radius:2.4070000000e-05
Characteristically it is only slightly tighter clustering than previously.
Sparing a few issues (#422, #417, #416 ,#412) the bulk of this feature has been integrated in #380 .
Refactor and update
ctf
branch so it can be integrated intodevelop
branch.