Closed brett1479 closed 6 years ago
Another point is that before we wrote w = sum a_i y_ix_i, we didn't know how to turn dual solutions into primal solutions (at least for the w part, b later).
That's interesting -- I actually don't know why it's immediate from the primal problem that w is a linear combination of the x_i vectors. Is there a way to show it that's easier than the representer theorem? Do tell!
Assuming I have the primal problem correct, write w = wx + wx' where wx is in the span of the xi's, and wx' is in the orthogonal complement. Then wx' has no effect on the decision function and only increases the l2-regularization term.
And that's the proof of the representer theorem.
On Jan 6, 2017, at 8:28 PM, brett1479 notifications@github.com wrote:
Assuming I have the primal problem correct, write w = wx + wx' where wx is in the span of the xi's, and wx' is in the orthogonal complement. Then wx' has no effect on the decision function and only increases the l2-regularization term.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.
Yeah, I guess it is. That said, I think it is much more direct than Lagrange duality (and in particular, the strong duality we used).
For SVM, I think the main payoff of strong duality is this set of results on the relation between the dual solution and the margin. And for the class as a whole, the only additional thing is the equivalence of the norm penalization and bounded norm forms of regularization. It'd be nice if we could get more from it, so long as we're teaching it anyway.
Yeah, I agree. I just meant that the fact that w is a linear combination of the xs follows from much less than the strong duality.
Yeah I agree. And once you know w is a linear combination of x's, the fact that you can kernelize follows immediately. Yet for years everybody "derived" kernelization for SVM via the dual.
I don't think the fact that w is a linear combination of the xi vectors is a surprising fact that we obtained from the analysis. I believe that follows immediately from the statement of the primal problem. What makes the linear combination result a bit more interesting is that the coefficients on the xi have a fixed sign.