Open zhongshsh opened 1 year ago
Which teacher did you use?How abont the performanc of ID teacher and OOD teacher on VQA v2 dataset?
I train the teacher by following the README. Specifically, use the code :
CUDA_VISIBLE_DEVICES=0 python main.py --dataset v2 --mode q_v_debias --debias learned_mixin --topq 1 --topv -1 --qvp 5 --output v2_lmh_css --seed 2048
Then I assign new answer
CUDA_VISIBLE_DEVICES=0 python assign_answer.py --dataset v2 --name other --split low --teacher_path logs/v2_lmh_css/model.pth
For VQA v2, I may train the teacher by following command:
CUDA_VISIBLE_DEVICES=0 python main.py --dataset v2 --mode q_v_debias --debias learned_mixin --topq 1 --topv -1 --qvp 9 --output v2_lmh_css --seed 2048 --epoch 40
You can try again.
I train the teacher by :
CUDA_VISIBLE_DEVICES=4 python main.py --dataset v2 --mode q_v_debias --debias learned_mixin --topq 1 --topv -1 --qvp 9 --output v2_lmh_css_issue --seed 2048 --epoch 40
The content of log as follows, which is higher than before.
epoch 0, time: 240.35
train_loss: 6.22, score: 17.05
eval score: 38.57 (91.72)
yn score: 51.48 other score: 35.36 num score: 13.66
epoch 1, time: 194.00
train_loss: 3.69, score: 34.63
eval score: 40.82 (91.72)
yn score: 40.48 other score: 45.30 num score: 24.97
epoch 2, time: 212.26
train_loss: 3.35, score: 35.37
eval score: 35.37 (91.72)
yn score: 23.41 other score: 48.40 num score: 20.69
epoch 3, time: 206.79
train_loss: 3.19, score: 39.23
eval score: 37.90 (91.72)
yn score: 24.16 other score: 50.52 num score: 29.86
epoch 4, time: 206.55
train_loss: 3.06, score: 41.93
eval score: 42.73 (91.72)
yn score: 35.48 other score: 51.36 num score: 31.08
epoch 5, time: 204.88
train_loss: 2.95, score: 44.03
eval score: 44.77 (91.72)
yn score: 38.17 other score: 52.71 num score: 33.81
epoch 6, time: 207.57
train_loss: 2.86, score: 46.77
eval score: 44.46 (91.72)
yn score: 36.56 other score: 53.07 num score: 34.73
epoch 7, time: 206.11
train_loss: 2.78, score: 49.78
eval score: 48.31 (91.72)
yn score: 46.27 other score: 53.78 num score: 33.59
epoch 8, time: 204.72
train_loss: 2.73, score: 51.48
eval score: 48.91 (91.72)
yn score: 46.82 other score: 53.99 num score: 35.78
epoch 9, time: 209.48
train_loss: 2.67, score: 52.84
eval score: 48.97 (91.72)
yn score: 46.76 other score: 54.21 num score: 35.65
epoch 10, time: 206.95
train_loss: 2.62, score: 54.73
eval score: 47.96 (91.72)
yn score: 44.44 other score: 54.43 num score: 33.68
epoch 11, time: 206.52
train_loss: 2.56, score: 56.71
eval score: 48.97 (91.72)
yn score: 46.31 other score: 54.58 num score: 35.48
epoch 12, time: 206.86
train_loss: 2.51, score: 58.15
eval score: 52.85 (91.72)
yn score: 55.86 other score: 54.87 num score: 36.63
epoch 13, time: 202.63
train_loss: 2.47, score: 60.24
eval score: 52.95 (91.72)
yn score: 55.77 other score: 54.85 num score: 37.70
epoch 14, time: 204.77
train_loss: 2.42, score: 61.68
eval score: 53.30 (91.72)
yn score: 56.29 other score: 54.90 num score: 38.71
epoch 15, time: 207.28
train_loss: 2.38, score: 63.30
eval score: 54.19 (91.72)
yn score: 58.96 other score: 55.12 num score: 37.05
epoch 16, time: 203.41
train_loss: 2.34, score: 64.82
eval score: 54.36 (91.72)
yn score: 59.30 other score: 55.02 num score: 37.79
epoch 17, time: 204.64
train_loss: 2.31, score: 65.78
eval score: 56.17 (91.72)
yn score: 63.46 other score: 55.09 num score: 39.36
epoch 18, time: 209.91
train_loss: 2.29, score: 66.96
eval score: 56.15 (91.72)
yn score: 64.00 other score: 55.18 num score: 37.30
epoch 19, time: 206.39
train_loss: 2.24, score: 68.34
eval score: 56.17 (91.72)
yn score: 63.59 other score: 55.01 num score: 39.25
epoch 20, time: 208.02
train_loss: 2.23, score: 69.14
eval score: 55.12 (91.72)
yn score: 60.96 other score: 55.09 num score: 38.53
epoch 21, time: 209.03
train_loss: 2.20, score: 70.46
eval score: 54.36 (91.72)
yn score: 59.11 other score: 54.77 num score: 39.19
epoch 22, time: 206.20
train_loss: 2.18, score: 70.92
eval score: 56.75 (91.72)
yn score: 64.47 other score: 55.18 num score: 40.55
epoch 23, time: 209.92
train_loss: 2.15, score: 72.00
eval score: 56.62 (91.72)
yn score: 64.29 other score: 55.25 num score: 39.77
epoch 24, time: 208.95
train_loss: 2.12, score: 72.64
eval score: 57.59 (91.72)
yn score: 66.79 other score: 55.21 num score: 40.19
epoch 25, time: 208.71
train_loss: 2.12, score: 73.41
eval score: 57.88 (91.72)
yn score: 67.65 other score: 55.35 num score: 39.39
epoch 26, time: 207.85
train_loss: 2.08, score: 74.34
eval score: 57.71 (91.72)
yn score: 67.62 other score: 55.21 num score: 38.71
epoch 27, time: 209.99
train_loss: 2.07, score: 74.69
eval score: 58.24 (91.72)
yn score: 69.08 other score: 55.11 num score: 38.91
epoch 28, time: 211.65
train_loss: 2.06, score: 75.41
eval score: 58.48 (91.72)
yn score: 69.65 other score: 55.14 num score: 39.01
epoch 29, time: 208.22
train_loss: 2.04, score: 75.85
eval score: 58.56 (91.72)
yn score: 69.85 other score: 55.20 num score: 38.88
epoch 30, time: 206.02
train_loss: 2.01, score: 76.51
eval score: 58.84 (91.72)
yn score: 70.59 other score: 55.00 num score: 39.68
epoch 31, time: 209.33
train_loss: 2.02, score: 76.54
eval score: 58.52 (91.72)
yn score: 69.40 other score: 55.03 num score: 40.53
epoch 32, time: 207.57
train_loss: 1.99, score: 77.05
eval score: 57.85 (91.72)
yn score: 67.83 other score: 55.04 num score: 39.81
epoch 33, time: 211.12
train_loss: 1.98, score: 77.48
eval score: 58.95 (91.72)
yn score: 70.91 other score: 54.99 num score: 39.56
epoch 34, time: 210.15
train_loss: 1.97, score: 77.84
eval score: 58.10 (91.72)
yn score: 68.10 other score: 55.09 num score: 40.79
epoch 35, time: 210.03
train_loss: 1.94, score: 78.30
eval score: 58.96 (91.72)
yn score: 70.89 other score: 54.92 num score: 39.94
epoch 36, time: 209.06
train_loss: 1.91, score: 78.71
eval score: 59.26 (91.72)
yn score: 71.95 other score: 54.96 num score: 39.11
epoch 37, time: 207.74
train_loss: 1.92, score: 78.87
eval score: 57.91 (91.72)
yn score: 67.98 other score: 54.95 num score: 40.17
epoch 38, time: 210.99
train_loss: 1.89, score: 79.22
eval score: 58.88 (91.72)
yn score: 70.60 other score: 54.91 num score: 40.25
epoch 39, time: 210.63
train_loss: 1.87, score: 79.44
eval score: 58.61 (91.72)
yn score: 70.33 other score: 54.78 num score: 39.47
But when I update new answer to aug data and finetune the backbone, the results still go worse.
CUDA_VISIBLE_DEVICES=0 python aug_main.py --backbone logs/v2_updn/model.pth --aug_name all --dataset v2 --output v2_all_finetune_issue --seed 0
epoch 0, time: 532.43
train_loss: 13.46, score: 83.21
eval score: 62.51 (91.72)
yn score: 79.48 other score: 55.25 num score: 41.24
epoch 1, time: 514.16
train_loss: 12.96, score: 84.34
eval score: 62.44 (91.72)
yn score: 79.40 other score: 55.19 num score: 41.11
epoch 2, time: 539.51
train_loss: 12.80, score: 84.78
eval score: 62.35 (91.72)
yn score: 79.24 other score: 55.16 num score: 41.02
epoch 3, time: 544.58
train_loss: 12.70, score: 85.05
eval score: 62.33 (91.72)
yn score: 79.41 other score: 55.03 num score: 40.89
epoch 4, time: 525.27
train_loss: 12.63, score: 85.26
eval score: 62.21 (91.72)
yn score: 79.13 other score: 54.97 num score: 41.00
epoch 5, time: 524.51
train_loss: 12.58, score: 85.42
eval score: 62.12 (91.72)
yn score: 78.88 other score: 54.97 num score: 40.96
epoch 6, time: 522.15
train_loss: 12.53, score: 85.56
eval score: 62.08 (91.72)
yn score: 78.94 other score: 54.95 num score: 40.60
epoch 7, time: 527.31
train_loss: 12.50, score: 85.67
eval score: 62.03 (91.72)
yn score: 78.78 other score: 54.86 num score: 41.00
epoch 8, time: 517.79
train_loss: 12.46, score: 85.76
eval score: 61.94 (91.72)
yn score: 78.66 other score: 54.82 num score: 40.83
epoch 9, time: 515.08
train_loss: 12.44, score: 85.86
eval score: 61.95 (91.72)
yn score: 78.67 other score: 54.76 num score: 41.09
CUDA_VISIBLE_DEVICES=1 python aug_main.py --backbone logs/v2_updn/model.pth --aug_name total --dataset v2 --output v2_total_finetune_issue --seed 0
epoch 0, time: 91.96
train_loss: 11.77, score: 85.08
eval score: 62.37 (91.72)
yn score: 80.15 other score: 54.47 num score: 41.12
epoch 1, time: 101.57
train_loss: 11.03, score: 86.24
eval score: 62.26 (91.72)
yn score: 79.92 other score: 54.40 num score: 41.19
epoch 2, time: 95.16
train_loss: 10.84, score: 86.65
eval score: 62.11 (91.72)
yn score: 79.81 other score: 54.33 num score: 40.66
epoch 3, time: 94.00
train_loss: 10.73, score: 86.94
eval score: 62.04 (91.72)
yn score: 79.75 other score: 54.23 num score: 40.67
epoch 4, time: 99.22
train_loss: 10.64, score: 87.17
eval score: 61.98 (91.72)
yn score: 79.72 other score: 54.16 num score: 40.63
epoch 5, time: 98.48
train_loss: 10.57, score: 87.32
eval score: 61.97 (91.72)
yn score: 79.77 other score: 54.11 num score: 40.54
epoch 6, time: 93.91
train_loss: 10.52, score: 87.49
eval score: 61.80 (91.72)
yn score: 79.57 other score: 53.94 num score: 40.47
epoch 7, time: 95.05
train_loss: 10.48, score: 87.61
eval score: 61.81 (91.72)
yn score: 79.74 other score: 53.86 num score: 40.37
epoch 8, time: 99.80
train_loss: 10.44, score: 87.73
eval score: 61.65 (91.72)
yn score: 79.43 other score: 53.83 num score: 40.14
epoch 9, time: 97.78
train_loss: 10.41, score: 87.82
eval score: 61.63 (91.72)
yn score: 79.41 other score: 53.73 num score: 40.42
Baseline (updn, v2)
I run the following code:
and get the following log:
Finetune
However, when I run the finetune code, score is worse.
all_aug_dataset
clip-based filtering aug_dataset