Open flipdazed opened 7 years ago
Description of the cost calculation in the DBN which calls on the RBM functions in models.rbm.RBM
to calculate the pre-training cost.
DBN (RBM) Cost Calculation
The cost is calculated by models.rbm.RBM.getReconstructionCost
(which is called by models.rbm.RBM.getCostUpdates
as the argument persistent=None
) given as,
def getReconstructionCost(self, updates, pre_sigmoid_nv):
cross_entropy = T.mean(
T.sum(
self.inputs * T.log(T.nnet.sigmoid(pre_sigmoid_nv)) +
(1 - self.inputs) * T.log(1 - T.nnet.sigmoid(pre_sigmoid_nv)),
axis=1
)
)
return cross_entropy
with the cross_entropy
described here and mathematically expressed as,
\begin{equation} L(\mathbf{x}, \mathbf{z}) = -\sum^d_{k=1}\left[ \mathbf{x}_k \ln{\mathbf{z}_k} + (1-\mathbf{x}_k)\ln{(1-\mathbf{z}_k)} \right] \end{equation}
where $\mathbf{x} \in\mathbb{Z}_2^d$ are the binary inputs into the RBM and $\mathbf{z}' \in \mathbb{R}^d$ are the pre-sigmoid activation nodes. The activation (post-sigmoid) nodes $\mathbf{z} = s(\mathbf{z}')$ for some function $s$, assumed here as the sigmoid activation function given by T.nnet.sigmoid
so that $\mathbf{z} \in\mathbb{Z}_2^d$ can be seen as a prediction of $\mathbf{x}$. The inner product of the hidden nodes, $\mathbf{y}$ and their respective weights $\mathbf{w}$ and bias $\mathbf{b}$ provide the definition of the pre-sigmoid activation nodes as $\mathbf{z}' = \mathbf{w}\mathbf{y} + \mathbf{b}$ and is described here and is generated in the scan
operation of models.rbm.RBM.getCostUpdates
which calls the function models.rbm.RBM.propDown
to calculate $\mathbf{z}'$, and hence $\mathbf{z}$.
This means that for a minibatch, $L_i(\mathbf{x}, \mathbf{z})$ is a vector of length $n$, indexed by $i$, corresponding a minibatch size of $n$. The cross-entropy cost of the minibatch is then the average over this vector,
\begin{equation} \text{minibatch cost} = \langle L_i(\mathbf{x}, \mathbf{z}) \rangle \end{equation}
Checked for sign errors - see #3
models.rbm.RBM.freeEnergy
also checked for errors
The MNIST tests
test_dbn.py
test_mlp.py
demonstrate negative costs decreasing as expected
This seems to be a phenomenon distinctly of models.rbm
- moved analysis of code to #1
The rbm
shows a cost that increases towards zero.
The dbn
increases towards zero during the pretraining of the rbm
and decreases during finetraining
Issue The cost should be decreasing with training epochs, instead it increases
Affected
Reproduction
Runlog (open in OS X terminal with
cat runlog.txt
to enable full colour output)