CS231n: Convolutional Neural Networks for Visual Recognition

bluejad commented 7 years ago

斯坦福CS231n：面向视觉识别的卷积神经网络

CS231n课程翻译系列

课程教师Andrej Karpathy

bluejad commented 7 years ago

图像分类笔记

bluejad commented 7 years ago

那么具体如何比较两张图片呢？在本例中，就是比较32x32x3的像素块。最简单的方法就是逐个像素比较，最后将差异值全部加起来。换句话说，就是将两张图片先转化为两个向量I_1和I_2，然后计算他们的L1距离：

bluejad commented 7 years ago

Xtr（大小是50000x32x32x3）存有训练集中所有的图像，Ytr是对应的长度为50000的1维数组，存有图像对应的分类标签（从0到9）

Xtr, Ytr, Xte, Yte = load_CIFAR10('data/cifar10/') 

Xtr_rows = Xtr.reshape(Xtr.shape[0], 32 * 32 * 3)     # Xtr_rows becomes 50000 x 3072

Xte_rows = Xte.reshape(Xte.shape[0], 32 * 32 * 3)   # Xte_rows becomes 10000 x 3072

现在我们得到所有的图像数据，并且把他们拉长成为行向量

bluejad commented 7 years ago

训练并评价一个分类器

nn = NearestNeighbor() # create a Nearest Neighbor classifier class

nn.train(Xtr_rows, Ytr) # train the classifier on the training images and labels

Yte_predict = nn.predict(Xte_rows) # predict labels on the test images

# and now print the classification accuracy, which is the average number
# of examples that are correctly predicted (i.e. label matches)

print 'accuracy: %f' % ( np.mean(Yte_predict == Yte) )

bluejad commented 7 years ago

train(X, y)函数。该函数使用训练集的数据和标签来进行训练。从其内部来看，类应该实现一些关于标签和标签如何被预测的模型。这里还有个predict(X)函数，它的作用是预测输入的新数据的分类标签

import numpy as np

class NearestNeighbor(object):
  def __init__(self):
    pass

  def train(self, X, y):
    """ X is N x D where each row is an example. Y is 1-dimension of size N """
    # the nearest neighbor classifier simply remembers all the training data
    self.Xtr = X
    self.ytr = y

  def predict(self, X):
    """ X is N x D where each row is an example we wish to predict label for """
    num_test = X.shape[0]
    # lets make sure that the output type matches the input type
    Ypred = np.zeros(num_test, dtype = self.ytr.dtype)

    # loop over all test rows
    for i in xrange(num_test):
      # find the nearest training image to the i'th test image
      # using the L1 distance (sum of absolute value differences)
      distances = np.sum(np.abs(self.Xtr - X[i,:]), axis = 1)
      min_index = np.argmin(distances) # get the index with smallest distance
      Ypred[i] = self.ytr[min_index] # predict the label of the nearest example

    return Ypred

bluejad commented 7 years ago

距离选择：计算向量间的距离有很多种方法，另一个常用的方法是L2距离，从几何学的角度，可以理解为它在计算两个向量间的欧式距离。L2距离的公式如下：

用L2在Numpy中，我们只需要替换上面代码中的1行代码就行：

distances = np.sqrt(np.sum(np.square(self.Xtr - X[i,:]), axis = 1))

bluejad commented 7 years ago

把训练集分成训练集和验证集。使用验证集来对所有超参数调优。最后只在测试集上跑一次并报告结果

bluejad commented 7 years ago

决不能使用测试集来进行调优

bluejad commented 7 years ago

交叉验证：有时候，训练集数量较小（因此验证集的数量更小），人们会使用一种被称为交叉验证的方法，这种方法更加复杂些。还是用刚才的例子，如果是交叉验证集，我们就不是取1000个图像，而是将训练集平均分成5份，其中4份用来训练，1份用来验证。然后我们循环着取其中4份来训练，其中1份来验证，最后取所有5次验证结果的平均值作为算法验证结果

bluejad commented 7 years ago

L1范数，L2范数和点积被称为超参数(hyperparameter)

bluejad commented 7 years ago

在实际情况下，人们不是很喜欢用交叉验证，主要是因为它会耗费较多的计算资源。一般直接把训练集按照50%-90%的比例分成训练集和验证集。但这也是根据具体情况来定的：如果超参数数量多，你可能就想用更大的验证集，而验证集的数量不够，那么最好还是用交叉验证吧。至于分成几份比较好，一般都是分成3、5和10份

fold 5 调优

default

bluejad commented 7 years ago

选取超参数的正确方法是：将原始训练集分为训练集和验证集，我们在验证集上尝试不同的超参数，最后保留表现最好那个
如果训练数据量不够，使用交叉验证方法，它能帮助我们在选取最优超参数的时候减少噪音
一旦找到最优的超参数，就让算法以该参数在测试集跑且只跑一次，并根据测试结果评价算法

bluejad commented 7 years ago

Principal component analysis (PCA)

bluejad commented 7 years ago

线性分类笔记

bluejad commented 7 years ago

们将要实现一种更强大的方法来解决图像分类问题，该方法可以自然地延伸到神经网络和卷积神经网络上。这种方法主要有两部分组成：评分函数（score function），是损失函数（loss function）

评分函数（score function）：它是原始图像数据到类别分值的映射

损失函数（loss function）：它是用来量化预测分类标签的得分与真实标签之间一致性的

该方法可转化为一个最优化问题，在最优化过程中，将通过更新评分函数的参数来最小化损失函数值

bluejad commented 7 years ago

W：在空间中对应的直线开始向着不同方向旋转

b：对应的直线平移

bluejad commented 7 years ago

分类评分函数定义为：

default

bluejad commented 7 years ago

f(x,W,b)计算技巧

f x w b

bluejad commented 7 years ago

在机器学习中，对于输入的特征做归一化（normalization）处理是常见的套路
而在图像分类的例子中，图像上的每个像素可以看做一个特征。在实践中，对每个特征减去平均值来中心化数据是非常重要的。在这些图片的例子中，该步骤意味着根据训练集中所有的图像计算出一个平均图像值，然后每个图像都减去这个平均值，这样图像的像素值就大约分布在[-127,127]之间了。下一个常见步骤是，让所有数值分布的区间变为[-1, 1]。零均值的中心化是很重要的，等我们理解了梯度下降后再来详细解释

bluejad commented 7 years ago

根据训练集中所有的图像计算出一个平均图像值，然后每个图像都减去这个平均值，这样图像的像素值就大约分布在[-127,127]之间

bluejad commented 7 years ago

评分函数（score function）

该函数的参数是权重矩阵W。在函数中，数据(x_i,y_i)是给定的，不能修改。但是我们可以调整权重矩阵这个参数，使得评分函数的结果与训练数据集中图像的真实类别一致，即评分函数在正确的分类的位置应当得到最高的评分（score）

bluejad commented 7 years ago

损失函数（Loss Function）（有时也叫代价函数Cost Function或目标函数Objective）

bluejad commented 7 years ago

当评分函数输出结果与真实结果之间差异越大，损失函数输出越大，反之越小

bluejad commented 7 years ago

SVM的损失函数想要SVM在正确分类上的得分始终比不正确分类上的得分高出一个边界值 default

bluejad commented 7 years ago

第i个数据的多类SVM的损失函数定义如下：

i svm

bluejad commented 7 years ago

我们对于预测训练集数据分类标签的情况总有一些不满意的，而损失函数就能将这些不满意的程度量化

bluejad commented 7 years ago

我们希望能向某些特定的权重W添加一些偏好，对其他权重则不添加，以此来消除模糊性。这一点是能够实现的，方法是向损失函数增加一个正则化惩罚（regularization penalty）部分。最常用的正则化惩罚是L2范式，L2范式通过对所有参数进行逐元素的平方惩罚来抑制大数值的权重：

default

bluejad commented 7 years ago

正则化(Regularization)带来的好处：

SVM们就有了最大边界（max margin）
避免过拟合

bluejad commented 7 years ago

绝大多数情况下设为\Delta=1.0都是安全的

bluejad commented 7 years ago

无正则损失函数无向量python实现

D = W.shape[0] 类别数

y 正确类指定的数

def L_i(x,y,W)
    D.shape[0]
    loss_i = 0.0
    scores = W.dot.(x)
    correct_class_score = scores[y]
    delta = 1.0
    for j in xrange(D)
        if j == y:
            continue
        loss_i += max(0,scores[j] - scores[y] + delta)
return loss_i

bluejad commented 7 years ago

无正则化损失函数无向量python实现

D = W.shape[0] 类别数

y 正确类指定的数

def L_i(x,y,W)
    D.shape[0]
    loss_i = 0.0
    scores = W.dot.(x)
    correct_class_score = scores[y]
    delta = 1.0
    for j in xrange(D)
        if j == y:
            continue
        loss_i += max(0, scores[j] - scores[y] + delta)
    return loss_i

bluejad commented 7 years ago

无正则化损失函数半向量python实现

D = W.shape[0] 类别数

y 正确类指定的数

def L_i_vectorized(x,y,W)
    scores = W.dot(x)
    delta = 1.0
    margins = np.maximum(0, scores - scores[y]+ delta)
    margins[y] = 0
    loss_i = np.sum(margins)
    return loss_i

bluejad commented 7 years ago

最常用的两个分类器：SVM分类器，Softmax分类器

bluejad commented 7 years ago

softmax 函数

softmax

其输入值是一个向量，向量中元素为任意实数的评分值（z中的），函数对其进行压缩，输出一个向量，其中每个元素值在0到1之间，且所有元素之和为1

bluejad commented 7 years ago

折叶损失（hinge loss）

max(0, -)

bluejad commented 7 years ago

在Softmax分类器中，函数映射f(x_i;W)=Wx_i保持不变，但将这些评分值视为每个分类的未归一化的对数概率，并且将折叶损失（hinge loss）替换为交叉熵损失（cross-entropy loss）。公式如下：

cross-entropy loss

在上式中，使用f_j来表示分类评分向量f中的第j个元素。和之前一样，整个数据集的损失值是数据集中所有样本数据的损失值L_i的均值与正则化损失R(W)之和

bluejad commented 7 years ago

在“真实”分布p和估计分布q之间的交叉熵定义如下：

p q

bluejad commented 7 years ago

SVM分类器使用的是折叶损失（hinge loss），有时候又被称为最大边界损失（max-margin loss）
Softmax分类器使用的是交叉熵损失（corss-entropy loss）

bluejad commented 7 years ago

SVM和Softmax的比较

bluejad commented 7 years ago

softmax计算

softmax 1

softmax 2

bluejad commented 7 years ago

一些结果L2SVM比Softmax更出色

bluejad commented 7 years ago

线性分类

default

bluejad commented 7 years ago

最优化笔记

bluejad commented 7 years ago

SVM实现的公式是：

svm

bluejad commented 7 years ago

最优化是寻找能使得损失函数值最小化的参数W的过程

bluejad commented 7 years ago

损失函数有多种版本和不同的实现方式：

Softmax

SVM

最优化Optimization

bluejad commented 7 years ago

随机本地搜索

一个随机W开始，然后生成一个随机的扰动$W ，只有当W + $W的损失值变低，我们才会更新

W = np.random.randn(10,3073) * 0.001
bestloss = float("inf")
for i in xrange(1000)
    step_size = 0.0001
    Wtry = W + np.random.randn(10, 3073) * step_size
    loss = L(Xtr_cross, Ytr, Wtry)
        if loss < bestloss:
           W = Wtry
           bestloss = loss
        print 'iter %d loss is %f' %(i, bestloss)

bluejad commented 7 years ago

跟随梯度

感受我们脚下山体的倾斜程度，然后向着最陡峭的下降方向下山

bluejad commented 7 years ago

在一维函数中，斜率是函数在某一点的瞬时变化率。梯度是函数的斜率的一般化表达，它不是一个值，而是一个向量。在输入空间中，梯度是各个维度的斜率组成的向量（或者称为导数derivatives）。对一维函数的求导公式如下：

default

当函数有多个参数的时候，我们称导数为偏导数。而梯度就是在每个维度上偏导数所形成的向量

bluejad commented 7 years ago

计算梯度有两种方法：一个是缓慢的近似方法（数值梯度法），但实现相对简单。另一个方法（分析梯度法）计算迅速，结果精确，但是实现时容易出错，且需要使用微分

bluejad / deeplearning

CS231n: Convolutional Neural Networks for Visual Recognition #2