Why using the ground truth to calculate the test loss?

RyanLiut commented 4 years ago

Hi, When you calculate the test loss, it uses pred to calculate the test loss. Like this: pred, gt_seq, gt_logseq, _, _, _ = model(feats, bfeats, labels, labels_svo) But in the forward function of the model, it uses: lan_cont = self.embed(torch.cat((svo_it[:,1:2], it.unsqueeze(1)), 1)) So, why here using the ground truth, that is it (but 'it' comes from the label), to get the test loss? Why don't use the predicted last word to calculate the wholely predicted result then to get the loss, because, in the test phase, we don't see the label until the time to calculate the loss?

Thank you very much!

SydCaption commented 4 years ago

please clarify testing and inference. If you want to calculate loss, no matter during train/val/test, you would need labels.
for val/test loss calculation, you can choose either using ground-truth 'it' (in the forward function as you point out) or predicted 'it' (in the sample functions as follows), since it is just a metric and does not contribute to model training (since no backward here).
please note during testing, the code allows loss calculation when label is provided, or omits it when not. https://github.com/SydCaption/SAAT/blob/6cf41781d5860455c17433486f1a84a26b6883de/train_svo.py#L343-L356
what actually participates in autometric evaluation is the decoded 'sent', which is generated in 'sample' mode that calls either the sample function or the sample_beam function https://github.com/SydCaption/SAAT/blob/6cf41781d5860455c17433486f1a84a26b6883de/model_svo.py#L464 https://github.com/SydCaption/SAAT/blob/6cf41781d5860455c17433486f1a84a26b6883de/model_svo.py#L545 In both functions, no ground-truth label is used for inference.

Hope this explanation helps~

RyanLiut commented 4 years ago

Thanks for the reply. I do know we need label during test. But I think we just use the label to calculation the final part, that is CrossEntropy(pred, label). But before that, the pred shouldn't use the label, like in image classification, we just forward the network using our test data without labels. Am I wrong about that?

SydCaption commented 4 years ago

Exactly and good suggestion. This might be better obervation during val/test, i.e. using the same setting as in actual inference. But different from image classification, though trained by cross-entropy loss, the final captioning performance is compared using the automatic metrics e.g. BLEU, etc. So I think, the loss calculation during val/test serves more as a signal to determine whether overfitting than its actual value for comparison. In this way, it does not hurt while keeping the setting the same as training.

RyanLiut commented 4 years ago

Yes, I see. But is that okay to use the label to calculate the pred in the test stage even if for the sake of overfitting? Because as far as I know, we pretend not to see the labels as in the real inference we do to get the pred (to be consistent with 'real' reference).

SydCaption commented 4 years ago

You're right and thanks for pointing out this! Please change to 'sample' mode when calculating the loss.

RyanLiut commented 4 years ago

Thanks for the reply.

SydCaption / SAAT

Why using the ground truth to calculate the test loss? #9