Closed mbaumer closed 7 years ago
so if you assume the grid can just 'drift' with the fitter, then as you might intuitively (and iirc did) expect, the proper transformation for the regularization is to subtract the mean difference, ie if v^t{i,\mu} is the \mu-th component (\in (0,1) for 2d) of the i-th vertex at fitter instance t (so v^0{i, \mu} is the initial grid), then your penalty function term that you add to the log likelihood is (for l2 norm and assuming you have N points):
L_penalty = \frac{\lambda}{2N} \sumi^N \sum\mu^2 (v^t{i, \mu} - v^0{i, \mu} - b^t_\mu)^2
where \lambda is your relative weighting of the penalty function and
b^t_\mu = \frac{1}{N} \sumi^N (v^t{i, \mu} - v^0_{i, \mu})
(ie the average displacement. This is quick to show by optimizing Lpenalty wrt b^t\mu.)
so then when you compute the gradient for gradient descent you would use for each v_{i,\mu} dLpenalty / dv^t{i, \mu} = \frac{\lambda}{N} (v^t{i, \mu} - b^t\mu - v^0_{i, \mu})
it is considerably more messy and with its own fitting required to let the grid also shear with the fitter
(that is now you replace v^0_{i, \mu} in that Lpenalty a \sum\nu A{\mu, \nu} v^0{i, \nu})
actually that is a dumb idea. It penalizes things far away from the center. I tried this but found it didn't help: save your initial grid and define the penalty wrt that original grid -- ie your regularization penalty is like lambda / 2 * sum((vx - vx_grid)^2 + (vy - vy_grid)^2), so then your derivatives are now weighted by lamda * (vx - vx_grid) type terms. It's commented in the fit_flat.py code
I haven't worked on this in a while, mostly because our current line of thinking is: if the residuals are small enough, it doesn't matter what patterns they have. The key problem I was running into was that once the regularization got into the regime where it was actually killing the residual checkerboard pattern, it was badly affecting the quality of the fit. (see nb on tychonoff branch)
If we want to move forward with this, I think we need to loosen the gaussian prior imposed by the regularization even further: like a gaussian+small uniform prior?
Answer: no
I'm doing this on a new branch, just to keep things clean.