Open bwnjnOEI opened 4 years ago
Hi there!
The loss computed on the training data used for the update of the parameters (loss
variable in train_mnist.py
) is reset at every batch, but depending on the gradient mode (full or stochastic), the parameters are updated at each batch or after a full epoch has been processed (in this case I might have made a mistake and forgot some scalar scaling corresponding to a mean averaging over the batches).
The other loss_train
variable is a mean averaging over the batch, and is not rigorous in the case of a stochastic update of the parameters (since the parameters evolve in this case as the mean is carried out).
I'm not sure what you refer to as the final model, what do you mean by that?
Hi there!
The loss computed on the training data used for the update of the parameters (
loss
variable intrain_mnist.py
) is reset at every batch, but depending on the gradient mode (full or stochastic), the parameters are updated at each batch or after a full epoch has been processed (in this case I might have made a mistake and forgot some scalar scaling corresponding to a mean averaging over the batches).The other
loss_train
variable is a mean averaging over the batch and is not rigorous in the case of a stochastic update of the parameters (since the parameters evolve in this case as the mean is carried out).I'm not sure what you refer to as the final model, what do you mean by that?
Looking at figure 4 in 《Reconciling modern machine learning practice and the bias-variance trade-off》, I'll make it simple, how to get the training loss in figure of the number of parameters vs training loss; Then, what's the difference of above training loss and the training loss in figure of epoch vs training loss.
Hi, yes I see in this figure multiple networks are trained and once trained the loss is taken over all of the training samples (resp. testing samples). Each point in this curve correspond to a trained network (at the last epoch of the training)
Hi, I'm confused about the training loss of the final model (trained):
In other words, what is the training loss definition in neural nets?