Because the forward pass for batch b is run simultaneously with the backward pass for batch b - 1 we should run an extra empty forward pass so all backward passes get calculated. For large classification tasks, this probably doesn't matter but e.g. for the original implementation of the pattern recognition regression task with 1 batch per epoch, it prevents learning.
Because the forward pass for batch
b
is run simultaneously with the backward pass for batchb - 1
we should run an extra empty forward pass so all backward passes get calculated. For large classification tasks, this probably doesn't matter but e.g. for the original implementation of the pattern recognition regression task with 1 batch per epoch, it prevents learning.