Wanggcong / Spatial-Temporal-Re-identification

[AAAI 2019] Spatial Temporal Re-identification
MIT License
384 stars 77 forks source link

Why is validation loss lower than the training loss. #4

Closed TheBobbyliu closed 3 years ago

TheBobbyliu commented 5 years ago

I haven't dug deep into the code, but find that validation accuracy is generally higher than training accuracy, like this: Epoch 23/59

train Loss: 0.3353 Acc: 0.8267 val Loss: 0.2442 Acc: 0.8602

Epoch 24/59

train Loss: 0.3229 Acc: 0.8391 val Loss: 0.2326 Acc: 0.8762

Epoch 25/59

train Loss: 0.3149 Acc: 0.8486 val Loss: 0.2430 Acc: 0.8495

Epoch 26/59

train Loss: 0.3033 Acc: 0.8621 val Loss: 0.1564 Acc: 0.9294

Epoch 27/59

train Loss: 0.2955 Acc: 0.8710 val Loss: 0.1795 Acc: 0.9321

Epoch 28/59

train Loss: 0.2854 Acc: 0.8745 val Loss: 0.1871 Acc: 0.9161

Epoch 29/59

train Loss: 0.2819 Acc: 0.8761 val Loss: 0.1720 Acc: 0.9201

Is it because the process of validation takes spatial-temporal distance into account while training doesn't?

Wanggcong commented 5 years ago

Both the processes of training and validation do not take spatial-temporal distance into account in this step. Actually, this step is to learn appearance feature representations.

The reason the accuracy of validation may be attributed to the fact that the validation set is small (each id contains one image and it could be unreliable. In addition, when setting "--train_all", both the training set and validation set are used for training. This setting follows the described repo.

zhaoqun05 commented 5 years ago

Epoch 37/59

train Loss: 0.1597 Acc: 0.9716 val Loss: 0.0578 Acc: 0.9957

Epoch 38/59

train Loss: 0.1611 Acc: 0.9719 val Loss: 0.0517 Acc: 0.9929

Epoch 39/59

train Loss: 0.1574 Acc: 0.9762 val Loss: 0.0476 Acc: 0.9943

Epoch 40/59

train Loss: 0.0867 Acc: 0.9927 val Loss: 0.0084 Acc: 1.0000

Epoch 41/59

train Loss: 0.0622 Acc: 0.9967 val Loss: 0.0062 Acc: 1.0000

Epoch 42/59

train Loss: 0.0527 Acc: 0.9975 val Loss: 0.0048 Acc: 1.0000

Epoch 43/59

train Loss: 0.0486 Acc: 0.9976 val Loss: 0.0044 Acc: 1.0000 Why is the accuracy of training lower than the accuracy of validation and the validation accuracy is 100%?

Wanggcong commented 5 years ago

See above. We follow https://github.com/layumi/Person_reID_baseline_pytorch.

It is observed that in that code, the code sets model.val() and clears gradient by using optimizer.zero_grad().

I still doubt if the parameters update. Maybe it should use torch.no_grad() instead. I also want to ask the original authors for this problem.

Anyway, this is just a parameter tuning method. When testing, we confirm the test set and the training set are non-overlapping.

Wanggcong commented 5 years ago

I have found that the original authors updated their code and used as follows.

forward

            if phase == 'val':
                with torch.no_grad():
                    outputs = model(inputs)

I do not verify if the accuracy of the val set drops using torch.no_grad(),

Just keep in mind, the val set can be regarded as a parameter tuning method. The test set and the training set are non-overlapping.

zhaoqun05 commented 5 years ago

Thanks for your answer!