The batch optimizers are created from scratch for each minibatch iteration. This results in the minibatch trainers being very slow compared to the batch ones.
A possible solution would be to reuse the optimizer:
GD - no problem, as the current iteration does not depend on the current one
CGD - drop the previous state, maybe make it the same as the current one
The batch optimizers are created from scratch for each minibatch iteration. This results in the minibatch trainers being very slow compared to the batch ones.
A possible solution would be to reuse the optimizer: