In train.py L187-L198, is the loss calculation wrong? loss is the mean value of a batch. The final printed result loss_batch is divided by batch_count, which should be divided by the number of gradient return.
loss = loss/args.batch_size
is_fst_loss = True
loss.backward()
if args.clip_grad is not None:
torch.nn.utils.clip_grad_norm_(model.parameters(),args.clip_grad)
optimizer.step()
#Metrics
loss_batch += loss.item()
print('TRAIN:','\t Epoch:', epoch,'\t Loss:',loss_batch/batch_count)
And I have another question. In the training process, loss is negative, What kind of case is convergence?
Thank you for your reply.
In train.py L187-L198, is the loss calculation wrong? loss is the mean value of a batch. The final printed result loss_batch is divided by batch_count, which should be divided by the number of gradient return.
And I have another question. In the training process, loss is negative, What kind of case is convergence? Thank you for your reply.