Does tychonoff regularization get rid of checkerboard residuals?

cpadavis / weak_sauce

weak pixel area distortions

2 stars 0 forks source link

Does tychonoff regularization get rid of checkerboard residuals? #11

Closed mbaumer closed 7 years ago

mbaumer commented 9 years ago

I'm doing this on a new branch, just to keep things clean.

mbaumer commented 9 years ago

[x] Check that the grid isn't offset in some absolute way, leading to an unfair likelihood penalty.
[x] Need an actual convergence criterion (when vertices slow down past some cutoff? when the residual power sinks below some cutoff?)

cpadavis commented 9 years ago

so if you assume the grid can just 'drift' with the fitter, then as you might intuitively (and iirc did) expect, the proper transformation for the regularization is to subtract the mean difference, ie if v^t{i,\mu} is the \mu-th component (\in (0,1) for 2d) of the i-th vertex at fitter instance t (so v^0{i, \mu} is the initial grid), then your penalty function term that you add to the log likelihood is (for l2 norm and assuming you have N points):

L_penalty = \frac{\lambda}{2N} \sumi^N \sum\mu^2 (v^t{i, \mu} - v^0{i, \mu} - b^t_\mu)^2

where \lambda is your relative weighting of the penalty function and

b^t_\mu = \frac{1}{N} \sumi^N (v^t{i, \mu} - v^0_{i, \mu})

(ie the average displacement. This is quick to show by optimizing Lpenalty wrt b^t\mu.)

so then when you compute the gradient for gradient descent you would use for each v_{i,\mu} dLpenalty / dv^t{i, \mu} = \frac{\lambda}{N} (v^t{i, \mu} - b^t\mu - v^0_{i, \mu})

cpadavis commented 9 years ago

it is considerably more messy and with its own fitting required to let the grid also shear with the fitter

(that is now you replace v^0_{i, \mu} in that Lpenalty a \sum\nu A{\mu, \nu} v^0{i, \nu})

cpadavis commented 9 years ago

actually that is a dumb idea. It penalizes things far away from the center. I tried this but found it didn't help: save your initial grid and define the penalty wrt that original grid -- ie your regularization penalty is like lambda / 2 * sum((vx - vx_grid)^2 + (vy - vy_grid)^2), so then your derivatives are now weighted by lamda * (vx - vx_grid) type terms. It's commented in the fit_flat.py code

mbaumer commented 9 years ago

I haven't worked on this in a while, mostly because our current line of thinking is: if the residuals are small enough, it doesn't matter what patterns they have. The key problem I was running into was that once the regularization got into the regime where it was actually killing the residual checkerboard pattern, it was badly affecting the quality of the fit. (see nb on tychonoff branch)

If we want to move forward with this, I think we need to loosen the gaussian prior imposed by the regularization even further: like a gaussian+small uniform prior?

mbaumer commented 7 years ago

Answer: no