Is averaging of the training loss in seq2seq correct?

awni / speech

A PyTorch Implementation of End-to-End Models for Speech-to-Text

Apache License 2.0

751 stars 176 forks source link

I don't understand the way the training loss is averaged.

The losses are summed for each minibatch, because of the argument _sizeaverage=False in _crossentropy function. Then, there is a line _loss_val = loss_val / batchsize that could average over all the batches, except that in one batch, there are many letters to decode, so the loss is calculated over more than _batchsize letters. The correct number would be y.shape[0] (all the predictions from all the batches are concatenated to one-dimensional vector). According to that, the line n. 66 in seq2seq.py should be

loss_val = loss_val / y.shape[0]

Am I right, or I'm missing something?

awni / speech

Is averaging of the training loss in seq2seq correct? #18