Closed RyanLiut closed 4 years ago
Hope this explanation helps~
Thanks for the reply. I do know we need label during test. But I think we just use the label to calculation the final part, that is CrossEntropy(pred, label)
. But before that, the pred shouldn't use the label, like in image classification, we just forward the network using our test data without labels. Am I wrong about that?
Exactly and good suggestion. This might be better obervation during val/test, i.e. using the same setting as in actual inference. But different from image classification, though trained by cross-entropy loss, the final captioning performance is compared using the automatic metrics e.g. BLEU, etc. So I think, the loss calculation during val/test serves more as a signal to determine whether overfitting than its actual value for comparison. In this way, it does not hurt while keeping the setting the same as training.
Yes, I see. But is that okay to use the label to calculate the pred
in the test stage even if for the sake of overfitting? Because as far as I know, we pretend not to see the labels as in the real inference we do to get the pred
(to be consistent with 'real' reference).
You're right and thanks for pointing out this! Please change to 'sample' mode when calculating the loss.
Thanks for the reply.
Hi, When you calculate the test loss, it uses pred to calculate the test loss. Like this:
pred, gt_seq, gt_logseq, _, _, _ = model(feats, bfeats, labels, labels_svo)
But in the forward function of the model, it uses:lan_cont = self.embed(torch.cat((svo_it[:,1:2], it.unsqueeze(1)), 1))
So, why here using the ground truth, that isit
(but 'it' comes from the label), to get the test loss? Why don't use the predicted last word to calculate the wholely predicted result then to get the loss, because, in the test phase, we don't see the label until the time to calculate the loss?Thank you very much!