flipdazed / weather-modelling

Deep Architectures for Weather Modelling
4 stars 1 forks source link

Cost increases with training in RBMs #2

Open flipdazed opened 7 years ago

flipdazed commented 7 years ago

Issue The cost should be decreasing with training epochs, instead it increases

Affected

train_dbn.py
train_mlp.py

Reproduction

python train_mlp.py
python train_dbn.py

Runlog (open in OS X terminal with cat runlog.txt to enable full colour output)

flipdazed commented 7 years ago

Description of the cost calculation in the DBN which calls on the RBM functions in models.rbm.RBM to calculate the pre-training cost.

DBN (RBM) Cost Calculation

The cost is calculated by models.rbm.RBM.getReconstructionCost (which is called by models.rbm.RBM.getCostUpdates as the argument persistent=None) given as,

def getReconstructionCost(self, updates, pre_sigmoid_nv):
    cross_entropy = T.mean(
        T.sum(
            self.inputs * T.log(T.nnet.sigmoid(pre_sigmoid_nv)) +
            (1 - self.inputs) * T.log(1 - T.nnet.sigmoid(pre_sigmoid_nv)),
            axis=1
        )
    )
    return cross_entropy

with the cross_entropy described here and mathematically expressed as,

\begin{equation} L(\mathbf{x}, \mathbf{z}) = -\sum^d_{k=1}\left[ \mathbf{x}_k \ln{\mathbf{z}_k} + (1-\mathbf{x}_k)\ln{(1-\mathbf{z}_k)} \right] \end{equation}

where $\mathbf{x} \in\mathbb{Z}_2^d$ are the binary inputs into the RBM and $\mathbf{z}' \in \mathbb{R}^d$ are the pre-sigmoid activation nodes. The activation (post-sigmoid) nodes $\mathbf{z} = s(\mathbf{z}')$ for some function $s$, assumed here as the sigmoid activation function given by T.nnet.sigmoid so that $\mathbf{z} \in\mathbb{Z}_2^d$ can be seen as a prediction of $\mathbf{x}$. The inner product of the hidden nodes, $\mathbf{y}$ and their respective weights $\mathbf{w}$ and bias $\mathbf{b}$ provide the definition of the pre-sigmoid activation nodes as $\mathbf{z}' = \mathbf{w}\mathbf{y} + \mathbf{b}$ and is described here and is generated in the scan operation of models.rbm.RBM.getCostUpdates which calls the function models.rbm.RBM.propDown to calculate $\mathbf{z}'$, and hence $\mathbf{z}$.

This means that for a minibatch, $L_i(\mathbf{x}, \mathbf{z})$ is a vector of length $n$, indexed by $i$, corresponding a minibatch size of $n$. The cross-entropy cost of the minibatch is then the average over this vector,

\begin{equation} \text{minibatch cost} = \langle L_i(\mathbf{x}, \mathbf{z}) \rangle \end{equation}

flipdazed commented 7 years ago

Checked for sign errors - see #3 models.rbm.RBM.freeEnergy also checked for errors

flipdazed commented 7 years ago

The MNIST tests

test_dbn.py
test_mlp.py 

demonstrate negative costs decreasing as expected

flipdazed commented 7 years ago

This seems to be a phenomenon distinctly of models.rbm - moved analysis of code to #1

The rbm shows a cost that increases towards zero. The dbn increases towards zero during the pretraining of the rbm and decreases during finetraining