check bactracking line search

dswah / sgcrfpy

SGCRFpy: Sparse Gaussian Conditional Random Fields in Python

MIT License

29 stars 5 forks source link

check bactracking line search #18

Closed dswah closed 7 years ago

dswah commented 8 years ago

The equations are still wrong.

On real problems we get increasing cost.

Also the optimization is currently very dependent on the regularization values. changing them slightly can produce very different results

dswah commented 8 years ago

or this

dswah commented 8 years ago

but setting lamT just a little higher makes things OK?

dswah commented 8 years ago

and setting lamT a little lower looks strange, but legal:

dswah commented 8 years ago

since Theta regularization is the variable here, is the problem in our Theta coordinate descent?

dswah commented 8 years ago

seems likely since i tried setting the slack parameter in the backtracking to 1e-9. This means that the lambda steps are guaranteed to reduce the loss...

dswah commented 8 years ago

here is the new descent check:

rhs = np.trace(np.dot(self.grad_wrt_Lam(fixed, vary), newton_lambda)) + \
               self.lamL * self.l1_norm_off_diag(self.Lam + newton_lambda) - \
               self.lamL * self.l1_norm_off_diag(self.Lam)

lhs = self.l1_neg_log_likelihood_wrt_Lam(self.Lam + alpha * newton_lambda, fixed, vary) -\
              self.l1_neg_log_likelihood_wrt_Lam(self.Lam, fixed, vary)

lhs <= alpha * self.slack * rhs

with slack = 1e-9, essentially we are checking

lhs <= 0

which means

self.l1_neg_log_likelihood_wrt_Lam(self.Lam + alpha * newton_lambda, fixed, vary) <=
self.l1_neg_log_likelihood_wrt_Lam(self.Lam, fixed, vary)

dswah commented 7 years ago

on toy problems the optimization works very well for all of the cases posted above. Tried on random cluster graphs 50x50.

dswah commented 7 years ago

this plot shows that even in the problem where our loss increases, the loss always decreases after the lamba update.

so the problem cannot be in the backtracking logic.