Open Calemsy opened 5 years ago
case 1: $$y = g(x), z = h(y) \longrightarrow \frac{dz}{dx} = \frac{dz}{dy}\frac{dy}{dx}$$
case 2: $$x = g(x), y = h(s), z = k(x, y) \longrightarrow \frac{dz}{ds} = \frac{\partial z}{\partial x}\frac{dx}{ds} + \frac{\partial z}{\partial y}\frac{dy}{ds}$$
因为总的损失是各个样本损失的和:$L(\theta) = \sum{n=1}^{N}l^n(\theta)$,所以总的损失对参数的偏导是各个样本上损失对参数偏导的求和: $$\frac{\partial L(\theta)}{\partial w}=\sum{n=1}^{N}\frac{\partial l^n(\theta)}{\partial w}$$
如下图: $$\frac{\partial l}{\partial w} = \frac{\partial l}{\partial z}\frac{\partial z}{\partial w}$$ 其中
$$\frac{\partial z}{\partial w_1} = x_1, \frac{\partial z}{\partial w_2} = x_2$$
$$\frac{\partial l}{\partial z} = ?$$
$$\frac{\partial l}{\partial z} = \frac{\partial l}{\partial a}\frac{\partial a}{\partial z}$$
$$\frac{\partial l}{\partial a} = \frac{\partial z'}{\partial a}\frac{\partial l}{\partial z'} + \frac{\partial z''}{\partial a}\frac{\partial l}{\partial z''} =w_3 \frac{\partial l}{\partial z'} + w_4 \frac{\partial l}{\partial z''}$$
假设:$\frac{\partial l}{\partial z'}$和$\frac{\partial l}{\partial z''}$是已知的,那么$\frac{\partial l}{\partial z}$的计算公式如下以及下图所示:
$$\frac{\partial l}{\partial z} = \frac{\partial a}{\partial z}[w_3 \frac{\partial l}{\partial z'} + w_4 \frac{\partial l}{\partial z''}] = \sigma'(z)[w_3 \frac{\partial l}{\partial z'} + w_4 \frac{\partial l}{\partial z''}]$$
上面的计算基于$\frac{\partial l}{\partial z'}$和$\frac{\partial l}{\partial z''}$是已知的,那么$\frac{\partial l}{\partial z'}$和$\frac{\partial l}{\partial z''}$如何计算呢?
case 1: 如果$z'$和$z''$是输出层神经元的输入,那么问题就结束了。 $$\frac{\partial l}{\partial z'} = \frac{\partial y_1}{\partial z'} \frac{\partial l}{\partial y_1}, \frac{\partial l}{\partial z''} = \frac{\partial y_2}{\partial z''} \frac{\partial l}{\partial y_2}$$
case 2: 如果$z'$和$z''$不是输出层神经元的输入,从最后一层反向计算$\frac{\partial l}{\partial z}$
1 - Chain Rule
case 1: $$y = g(x), z = h(y) \longrightarrow \frac{dz}{dx} = \frac{dz}{dy}\frac{dy}{dx}$$
case 2: $$x = g(x), y = h(s), z = k(x, y) \longrightarrow \frac{dz}{ds} = \frac{\partial z}{\partial x}\frac{dx}{ds} + \frac{\partial z}{\partial y}\frac{dy}{ds}$$
2 - BackPropagation
因为总的损失是各个样本损失的和:$L(\theta) = \sum{n=1}^{N}l^n(\theta)$,所以总的损失对参数的偏导是各个样本上损失对参数偏导的求和: $$\frac{\partial L(\theta)}{\partial w}=\sum{n=1}^{N}\frac{\partial l^n(\theta)}{\partial w}$$
如下图: $$\frac{\partial l}{\partial w} = \frac{\partial l}{\partial z}\frac{\partial z}{\partial w}$$ 其中
2.1 - Forward pass
$$\frac{\partial z}{\partial w_1} = x_1, \frac{\partial z}{\partial w_2} = x_2$$
2.2 - Backward pass
$$\frac{\partial l}{\partial z} = ?$$
$$\frac{\partial l}{\partial z} = \frac{\partial l}{\partial a}\frac{\partial a}{\partial z}$$
$$\frac{\partial l}{\partial a} = \frac{\partial z'}{\partial a}\frac{\partial l}{\partial z'} + \frac{\partial z''}{\partial a}\frac{\partial l}{\partial z''} =w_3 \frac{\partial l}{\partial z'} + w_4 \frac{\partial l}{\partial z''}$$
假设:$\frac{\partial l}{\partial z'}$和$\frac{\partial l}{\partial z''}$是已知的,那么$\frac{\partial l}{\partial z}$的计算公式如下以及下图所示:
$$\frac{\partial l}{\partial z} = \frac{\partial a}{\partial z}[w_3 \frac{\partial l}{\partial z'} + w_4 \frac{\partial l}{\partial z''}] = \sigma'(z)[w_3 \frac{\partial l}{\partial z'} + w_4 \frac{\partial l}{\partial z''}]$$
上面的计算基于$\frac{\partial l}{\partial z'}$和$\frac{\partial l}{\partial z''}$是已知的,那么$\frac{\partial l}{\partial z'}$和$\frac{\partial l}{\partial z''}$如何计算呢?
case 1: 如果$z'$和$z''$是输出层神经元的输入,那么问题就结束了。 $$\frac{\partial l}{\partial z'} = \frac{\partial y_1}{\partial z'} \frac{\partial l}{\partial y_1}, \frac{\partial l}{\partial z''} = \frac{\partial y_2}{\partial z''} \frac{\partial l}{\partial y_2}$$
case 2: 如果$z'$和$z''$不是输出层神经元的输入,从最后一层反向计算$\frac{\partial l}{\partial z}$
3 - Summary