davidrosenberg / mlcourse

Machine learning course materials.
https://davidrosenberg.github.io/ml2018
569 stars 267 forks source link

Lecture 3c Slide 17 Comment #19

Closed brett1479 closed 6 years ago

brett1479 commented 7 years ago

I don't think the fact that w is a linear combination of the xi vectors is a surprising fact that we obtained from the analysis. I believe that follows immediately from the statement of the primal problem. What makes the linear combination result a bit more interesting is that the coefficients on the xi have a fixed sign.

brett1479 commented 7 years ago

Another point is that before we wrote w = sum a_i y_ix_i, we didn't know how to turn dual solutions into primal solutions (at least for the w part, b later).

davidrosenberg commented 7 years ago

That's interesting -- I actually don't know why it's immediate from the primal problem that w is a linear combination of the x_i vectors. Is there a way to show it that's easier than the representer theorem? Do tell!

brett1479 commented 7 years ago

Assuming I have the primal problem correct, write w = wx + wx' where wx is in the span of the xi's, and wx' is in the orthogonal complement. Then wx' has no effect on the decision function and only increases the l2-regularization term.

davidrosenberg commented 7 years ago

And that's the proof of the representer theorem.

On Jan 6, 2017, at 8:28 PM, brett1479 notifications@github.com wrote:

Assuming I have the primal problem correct, write w = wx + wx' where wx is in the span of the xi's, and wx' is in the orthogonal complement. Then wx' has no effect on the decision function and only increases the l2-regularization term.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

brett1479 commented 7 years ago

Yeah, I guess it is. That said, I think it is much more direct than Lagrange duality (and in particular, the strong duality we used).

davidrosenberg commented 7 years ago

For SVM, I think the main payoff of strong duality is this set of results on the relation between the dual solution and the margin. And for the class as a whole, the only additional thing is the equivalence of the norm penalization and bounded norm forms of regularization. It'd be nice if we could get more from it, so long as we're teaching it anyway.

brett1479 commented 7 years ago

Yeah, I agree. I just meant that the fact that w is a linear combination of the xs follows from much less than the strong duality.

davidrosenberg commented 7 years ago

Yeah I agree. And once you know w is a linear combination of x's, the fact that you can kernelize follows immediately. Yet for years everybody "derived" kernelization for SVM via the dual.