Open Pin-Jiun opened 1 year ago
here are some you should be familiar with for this assignment:
np.argmax np.array_split, look at the axis argument np.concatenate, look at the axis argument np.zeros numpy.ndarray.shape numpy logic functions e.g. np.array([[1, 2],[3, 4]]) == 3
γ雖然沒有改變, 但會影響更新的速度, 因為每次只能更新一點
γ是最靠近直線的資料點距離 R是資料點離原點最遠的距離
Consider a linearly separable dataset with two features:
data = ([[200, 800, 200, 800],
[0.2, 0.2, 0.8, 0.8]])
labels = [[-1, -1, 1, 1]]
γ是最靠近直線的資料點距離 , the margin of the data set is the minimum of γ R是資料點離原點最遠的距離
What is the margin γ of this data set with respect to that separator (up to 3 decimal places)?
What is the theoretical bound on the number of mistakes perceptron will make on this problem?
How many mistakes does perceptron through origin have to make in order to find a perfect separator on the data provided above, in the order given? (Try it on your computer, not by hand!) Solution: 666696
How would the performance of the perceptron (as predicted by the mistake bound) change?
If we multiplied just the first original feature (first row of the data) by .001, and used our original separator, what would the new margin be?
What would the mistake bound be in this case?
Run the perceptron algorithm on this data; how many mistakes does it make? Solution: 7 Explanation:The speedup in running time is very significant and indicates the benefits of choosing features that help you in finding a classifier for your data.
we can convert any problem involving a linear separator with offset into one with no offset (but of higher dimension)!
Convergence theorem
if the training data Dn is linearly separable, then the perceptron algorithm is guaranteed to find a linear separator
we’ll define the margin of a labeled point (x, y)
This quantity will be positive if and only if the point x is classified as y by the linear classifier represented by this hyperplane
the margin of a dataset Dn
The margin is positive if and only if all of the points in the data-set are classified correctly. In that case (only!) it represents the distance from the hyperplane to the closest point.
margin是最靠近直線的資料點到直線的距離 , the margin of the data set is the minimum of γ R是資料點離原點最遠的距離
Proof.
We initialize θ0 = 0, and let θ(k) define our hyperplane after the perceptron algorithm has made k mistakes. We are going to think about the angle between the hypothesis we have now,θ(k) and the assumed good separator θ*. Since they both go through the origin, if we can show that the angle between them is decreasing usefully on every iteration, then we will get close to that separator.
let’s think about the cos of the angle between them, and recall, by the definition of dot product:
start by focusing on the first factor
θ(k-1)可以一直往下寫, 寫成θ(k-2)+γ, 最終寫成θ0, 內積後=0
Now, we’ll look at the second factor
關鍵是因為xi,yi分類錯誤, 所以
Since the value of the cosine is at most 1, we have
margin是最靠近直線的資料點到直線的距離 R是資料點離原點最遠的距離