brechetp / double-descent

Double descent effect reproduction
0 stars 0 forks source link

Problem about training loss #1

Open bwnjnOEI opened 4 years ago

bwnjnOEI commented 4 years ago

Hi, I'm confused about the training loss of the final model (trained):

  1. it's average on the final model (over train data on the final model), or
  2. average on the epoch (during the train)?

In other words, what is the training loss definition in neural nets?

brechetp commented 4 years ago

Hi there!

The loss computed on the training data used for the update of the parameters (loss variable in train_mnist.py) is reset at every batch, but depending on the gradient mode (full or stochastic), the parameters are updated at each batch or after a full epoch has been processed (in this case I might have made a mistake and forgot some scalar scaling corresponding to a mean averaging over the batches).

The other loss_train variable is a mean averaging over the batch, and is not rigorous in the case of a stochastic update of the parameters (since the parameters evolve in this case as the mean is carried out).

I'm not sure what you refer to as the final model, what do you mean by that?

bwnjnOEI commented 4 years ago

Hi there!

The loss computed on the training data used for the update of the parameters (loss variable in train_mnist.py) is reset at every batch, but depending on the gradient mode (full or stochastic), the parameters are updated at each batch or after a full epoch has been processed (in this case I might have made a mistake and forgot some scalar scaling corresponding to a mean averaging over the batches).

The other loss_train variable is a mean averaging over the batch and is not rigorous in the case of a stochastic update of the parameters (since the parameters evolve in this case as the mean is carried out).

I'm not sure what you refer to as the final model, what do you mean by that?

Looking at figure 4 in 《Reconciling modern machine learning practice and the bias-variance trade-off》, I'll make it simple, how to get the training loss in figure of the number of parameters vs training loss; Then, what's the difference of above training loss and the training loss in figure of epoch vs training loss.

brechetp commented 4 years ago

Hi, yes I see in this figure multiple networks are trained and once trained the loss is taken over all of the training samples (resp. testing samples). Each point in this curve correspond to a trained network (at the last epoch of the training)