dswah / sgcrfpy

SGCRFpy: Sparse Gaussian Conditional Random Fields in Python
MIT License
29 stars 5 forks source link

check bactracking line search #18

Closed dswah closed 7 years ago

dswah commented 8 years ago

The equations are still wrong.

On real problems we get increasing cost.

Also the optimization is currently very dependent on the regularization values. changing them slightly can produce very different results

dswah commented 8 years ago

image

dswah commented 8 years ago

or this image

dswah commented 8 years ago

but setting lamT just a little higher makes things OK?

image

dswah commented 8 years ago

and setting lamT a little lower looks strange, but legal:

image

dswah commented 8 years ago

since Theta regularization is the variable here, is the problem in our Theta coordinate descent?

dswah commented 8 years ago

seems likely since i tried setting the slack parameter in the backtracking to 1e-9. This means that the lambda steps are guaranteed to reduce the loss...

dswah commented 8 years ago

here is the new descent check:

rhs = np.trace(np.dot(self.grad_wrt_Lam(fixed, vary), newton_lambda)) + \
               self.lamL * self.l1_norm_off_diag(self.Lam + newton_lambda) - \
               self.lamL * self.l1_norm_off_diag(self.Lam)

lhs = self.l1_neg_log_likelihood_wrt_Lam(self.Lam + alpha * newton_lambda, fixed, vary) -\
              self.l1_neg_log_likelihood_wrt_Lam(self.Lam, fixed, vary)

lhs <= alpha * self.slack * rhs

with slack = 1e-9, essentially we are checking

lhs <= 0

which means

self.l1_neg_log_likelihood_wrt_Lam(self.Lam + alpha * newton_lambda, fixed, vary) <=
self.l1_neg_log_likelihood_wrt_Lam(self.Lam, fixed, vary)
dswah commented 7 years ago

on toy problems the optimization works very well for all of the cases posted above. Tried on random cluster graphs 50x50.

dswah commented 7 years ago

image

dswah commented 7 years ago

this plot shows that even in the problem where our loss increases, the loss always decreases after the lamba update.

so the problem cannot be in the backtracking logic.