Currently, the NLL loss computation doesn't take into account loss calculated as part of the padding index. These indices need to be ignored to more accurately measure the loss, so that the function optimizes properly.
For now, this doesn't matter since all the sequences are of the same length, meaning no padding is currently being used.
Currently, the NLL loss computation doesn't take into account loss calculated as part of the padding index. These indices need to be ignored to more accurately measure the loss, so that the function optimizes properly.
For now, this doesn't matter since all the sequences are of the same length, meaning no padding is currently being used.