ageron / handson-ml2

A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
Apache License 2.0
27.91k stars 12.76k forks source link

[Chapter 5 - SVM]: Linear SVM with Batch gradient descent #200

Open Dseal95 opened 4 years ago

Dseal95 commented 4 years ago

Hi,

In Geron's Github for chapter 5 he includes a Linear SVC implementation with batch gradient descent. If possible, could someone confirm my intuition on the following code:

t = y 2 - 1 # -1 if t==0, +1 if t==1 X_t = X t self.Js=[]

    # Training
    for epoch in range(self.n_epochs):
        support_vectors_idx = (X_t.dot(w) + t * b < 1).ravel()
        X_t_sv = X_t[support_vectors_idx]
        t_sv = t[support_vectors_idx]

        J = 1/2 * np.sum(w * w) + self.C * (np.sum(1 - X_t_sv.dot(w)) - b * np.sum(t_sv))
        self.Js.append(J)

In particular I am looking for a better understanding of :

X_t = X * t

I understand that the cost function for linear SVM is being written as:

J = 1/2 np.sum(w w) + self.C (np.sum(1 - X_t_sv.dot(w)) - b np.sum(t_sv))

as opposed to:

J = 1/2 np.sum(w w) + self.C * (np.sum(max(0,1-t(i)(X.dot(w)) + b)

Is the idea that once the last term is multiplied out you get:

J = 1/2 p.sum(w w) + self.C (np.sum(1 - (t(i)X.dot(w)) - t(i)*b)

where max(0,1) goes to 1 for support vectors and then the reason in the code you work out t_sv and X_sv is because the 2nd term in the cost function is only concerned with the support vectors. Hence,

J = 1/2 np.sum(w w) + self.C (np.sum(1 - X_t_sv.dot(w)) - b np.sum(t_sv)) is just the cost function being written such that the 2nd term has the following:

Any clarification would be greatly appreciated!

(see screenshot below for code from Githib)

Screenshot 2020-07-07 at 16 28 45
Praful932 commented 4 years ago

Is the equation written as a Generalized Lagrangian like in the appendix?

Dseal95 commented 4 years ago

@Praful932, yes I belive it is. Once written as a generalized lagrangian, I think it is them multiplied though with the np.sum to group the terms together. That is how I have interpretted it anyway and it does make sense if you just multiply out the eq.