4.6.7感觉也不太对

你是对的感谢指正该处答案将做如下修改考虑一个简单的单层没有偏置的神经网络，

$$ \hat{y}= \phi( \mathbf{W} \mathbf{x} ) $$

$\hat{y}$ 是输出， $\phi$ 是激活函数, $\mathbf{W}$ 是权重， $\mathbf{x}$ 是输入.

定义dropout 为函数

$$ D(h,p)=\begin{cases}0 & \text{ 概率为 } p \ \frac{h}{1-p} & \text{ 其他情况}\end{cases} $$

有

$$ \frac{\partial D(z,p) }{ \partial z}=\begin{cases}0 & \text{ 概率为 } p \ \frac{1}{1-p} & \text{ 其他情况}\end{cases} $$

, 取L为损失函数， $z=\mathbf{W} \mathbf{x}$, $z^\prime=D(\mathbf{W} \mathbf{x},p)$

考虑dropout 放在激活层后面

前向传播

$$ \hat{y}=D(\phi(\mathbf{W} \mathbf{x}),p)=\begin{cases}0 & \text{ 概率为 } p \ \displaystyle\frac{\phi(\mathbf{W} \mathbf{x})}{1-p} & \text{ 其他情况}\end{cases} $$

反向传播

$$ \frac{\partial L }{ \partial\mathbf{W}}=\frac{\partial L }{ \partial \hat{y}} \frac{\partial D(\phi(z),p)}{ \partial \phi(z)}\frac{\partial \phi(z)}{ \partial z}\frac{\partial z}{ \partial \mathbf{w}}=\begin{cases}0 & \text{ 概率为 } p \ \displaystyle\frac{\displaystyle\frac{\partial L }{ \partial \hat{y}} \frac{\partial \phi(z) }{ \partial z}\mathbf{x}}{1-p} & \text{ 其他情况}\end{cases} $$

考虑dropout 放在激活层前面的情形

前向传播

$$ \hat{y}=\phi(D(\mathbf{W} \mathbf{x},p))=\begin{cases}\phi(0) & \text{ 概率为 } p \ \displaystyle\phi(\frac{\mathbf{W} \mathbf{x}}{1-p}) & \text{ 其他情况}\end{cases} $$

反向传播

$$ \frac{\partial L }{ \partial\mathbf{W}}=\frac{\partial L }{ \partial \hat{y}} \frac{\partial \phi(z^\prime)}{ \partial z^\prime}\frac{\partial D(\mathbf{z},p)}{ \partial z}\frac{\partial z}{ \partial \mathbf{w}}=\begin{cases}0 & \text{ 概率为 } p \ \displaystyle\frac{\displaystyle\frac{\partial L }{ \partial \hat{y}} \frac{\partial \phi(z^\prime)}{ \partial z^\prime}\mathbf{x}}{1-p} & \text{ 其他情况}\end{cases} $$

由于激活函数 $\phi(0)$ 不一定为0所以dropout前置可能会使被置零的神经元在前向传播中还有会贡献值。

datawhalechina / d2l-ai-solutions-manual

4.6.7感觉也不太对 #50