Closed kunmingwu closed 6 years ago
Hi @kenny0805 . Thanks for the pull request. However, being the L1 norm non-smooth, the appropriate way to solve the problem would be through the proximal gradient method and not through gradient-based methods as is currently implemented.
Following your link and the slides, I have tried to implement the proximal gradient but I am not sure if I get the idea right.
I appreciate your efforts, but the algorithm is not correct. Its a different algorithm (that would take a significant amount of effort to implement), not merely a replacement on the definition of gradient.
You are right. I was trying to use OWL-QN for L1 using pylbfgs but then realized the package cannot handle bounds for thetas. I am closing the PR for now and will take another look. Thanks for your comments.
I find it useful to use L1 norm regularization in ordinal logistic regression to control the number of features to be included in the model.