Open ParadoxZW opened 5 years ago
Sure. Here is the log for the first 10 epochs. It is run with exactly this Github version but with slightly different hyperparameters (i.e., 20 epochs + BERT initialization + repeating VQA&MSCOCO to balance the amount of data). As a result, the mask_lm loss in this log might be lower in the first few epochs but the tendency should almost be the same.
I also observed that the obj_predict and attr_predict loss on the val set would go up (i.e., overfit). More discussions here.
The training loss for Epoch 1 is 7.3978
The losses are Mask_LM: 1.7000 Matched: 0.3337 Obj: 1.5459 Attr: 0.5252 Feat: 0.2981 QA: 2.9949
Overall Accu 0.2104, gqa Accu 0.2569, visual7w Accu 0.1355, vqa Accu 0.3005,
The valid loss is 5.3834
The losses are Mask_LM: 1.5037 Matched: 0.2868 Obj: 0.6320 Attr: 0.2603 Feat: 0.2448 QA: 2.4559
Overall Accu 0.2418, gqa Accu 0.2821, visual7w Accu 0.1593, vqa Accu 0.3072,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:42:16<00:00, 1.52s/it]
The training loss for Epoch 2 is 4.8495
The losses are Mask_LM: 1.4634 Matched: 0.2475 Obj: 0.4201 Attr: 0.2156 Feat: 0.2419 QA: 2.2610
Overall Accu 0.2501, gqa Accu 0.3072, visual7w Accu 0.1709, vqa Accu 0.3320,
The valid loss is 5.0167
The losses are Mask_LM: 1.4236 Matched: 0.2662 Obj: 0.6116 Attr: 0.2488 Feat: 0.2316 QA: 2.2350
Overall Accu 0.2680, gqa Accu 0.3215, visual7w Accu 0.1744, vqa Accu 0.3357,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:36:37<00:00, 1.51s/it]
The training loss for Epoch 3 is 4.4845
The losses are Mask_LM: 1.3955 Matched: 0.2310 Obj: 0.3448 Attr: 0.1872 Feat: 0.2336 QA: 2.0924
Overall Accu 0.2659, gqa Accu 0.3314, visual7w Accu 0.1826, vqa Accu 0.3433,
The valid loss is 4.8306
The losses are Mask_LM: 1.3758 Matched: 0.2535 Obj: 0.6184 Attr: 0.2410 Feat: 0.2254 QA: 2.1164
Overall Accu 0.2723, gqa Accu 0.3235, visual7w Accu 0.1826, vqa Accu 0.3371,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:37:16<00:00, 1.51s/it]
The training loss for Epoch 4 is 4.3071
The losses are Mask_LM: 1.3568 Matched: 0.2220 Obj: 0.3090 Attr: 0.1727 Feat: 0.2286 QA: 2.0181
Overall Accu 0.2721, gqa Accu 0.3397, visual7w Accu 0.1889, vqa Accu 0.3459,
The valid loss is 4.7712
The losses are Mask_LM: 1.3555 Matched: 0.2502 Obj: 0.6144 Attr: 0.2392 Feat: 0.2209 QA: 2.0910
Overall Accu 0.2703, gqa Accu 0.3283, visual7w Accu 0.1859, vqa Accu 0.3230,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:39:52<00:00, 1.51s/it]
The training loss for Epoch 5 is 4.1769
The losses are Mask_LM: 1.3261 Matched: 0.2149 Obj: 0.2839 Attr: 0.1621 Feat: 0.2252 QA: 1.9646
Overall Accu 0.2761, gqa Accu 0.3448, visual7w Accu 0.1925, vqa Accu 0.3489,
The valid loss is 4.7557
The losses are Mask_LM: 1.3381 Matched: 0.2457 Obj: 0.6339 Attr: 0.2433 Feat: 0.2185 QA: 2.0763
Overall Accu 0.2757, gqa Accu 0.3340, visual7w Accu 0.1867, vqa Accu 0.3338,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:39:06<00:00, 1.51s/it]
The training loss for Epoch 6 is 4.0650
The losses are Mask_LM: 1.2986 Matched: 0.2087 Obj: 0.2629 Attr: 0.1532 Feat: 0.2226 QA: 1.9189
Overall Accu 0.2865, gqa Accu 0.3597, visual7w Accu 0.1946, vqa Accu 0.3698,
The valid loss is 4.6940
The losses are Mask_LM: 1.2990 Matched: 0.2422 Obj: 0.6410 Attr: 0.2432 Feat: 0.2166 QA: 2.0521
Overall Accu 0.2896, gqa Accu 0.3504, visual7w Accu 0.1890, vqa Accu 0.3594,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:31:15<00:00, 1.50s/it]
The training loss for Epoch 7 is 3.9538
The losses are Mask_LM: 1.2719 Matched: 0.2038 Obj: 0.2450 Attr: 0.1455 Feat: 0.2205 QA: 1.8671
Overall Accu 0.2957, gqa Accu 0.3744, visual7w Accu 0.1985, vqa Accu 0.3821,
The valid loss is 4.6838
The losses are Mask_LM: 1.2941 Matched: 0.2368 Obj: 0.6598 Attr: 0.2422 Feat: 0.2148 QA: 2.0361
Overall Accu 0.2903, gqa Accu 0.3524, visual7w Accu 0.1906, vqa Accu 0.3581,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:30:50<00:00, 1.50s/it]
The training loss for Epoch 8 is 3.8464
The losses are Mask_LM: 1.2512 Matched: 0.1990 Obj: 0.2291 Attr: 0.1385 Feat: 0.2188 QA: 1.8097
Overall Accu 0.2987, gqa Accu 0.3780, visual7w Accu 0.2022, vqa Accu 0.3827,
The valid loss is 4.6147
The losses are Mask_LM: 1.2709 Matched: 0.2323 Obj: 0.6559 Attr: 0.2426 Feat: 0.2127 QA: 2.0003
Overall Accu 0.2912, gqa Accu 0.3571, visual7w Accu 0.1896, vqa Accu 0.3582,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:31:25<00:00, 1.50s/it]
The training loss for Epoch 9 is 3.7406
The losses are Mask_LM: 1.2282 Matched: 0.1949 Obj: 0.2152 Attr: 0.1324 Feat: 0.2173 QA: 1.7527
Overall Accu 0.3021, gqa Accu 0.3818, visual7w Accu 0.2068, vqa Accu 0.3823,
The valid loss is 4.5979
The losses are Mask_LM: 1.2550 Matched: 0.2317 Obj: 0.6618 Attr: 0.2406 Feat: 0.2114 QA: 1.9975
Overall Accu 0.2882, gqa Accu 0.3482, visual7w Accu 0.1963, vqa Accu 0.3484,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:32:17<00:00, 1.50s/it]
The training loss for Epoch 10 is 3.6358
The losses are Mask_LM: 1.2092 Matched: 0.1915 Obj: 0.2028 Attr: 0.1267 Feat: 0.2159 QA: 1.6897
Overall Accu 0.3050, gqa Accu 0.3845, visual7w Accu 0.2117, vqa Accu 0.3810,
The valid loss is 4.6232
The losses are Mask_LM: 1.2474 Matched: 0.2315 Obj: 0.6770 Attr: 0.2461 Feat: 0.2104 QA: 2.0108
Overall Accu 0.2840, gqa Accu 0.3445, visual7w Accu 0.1910, vqa Accu 0.3451,
Can you also upload the logs of 1) the last 10 epochs of the above training and 2) the new 12-epoch training as well?
Sure. Logs for the last 10 epochs are attached; only epochs 11-17 are shown since the experiment is still running.
BTW, after 16-epochs of pre-training, the fine-tuning results (i.e., 69.86 on VQA and 74.49 on NLVR2) match the number with my released pre-training snapshot. ~I will release the 20 weights (for each epoch) and the full logs when it finishes.~ The full log was lost because of a clean of the server logs... The snaps of these 20 snapshots are available here: https://nlp.cs.unc.edu/data/github_pretrain/lxmert20/EpochXX_LXRT.pth, XX from 01 to 20
. The fine-tuned results of the last epoch (i.e., XX=20) could reach the same pre-trained results.
The logs for the new 12-epoch pre-training is unfortunately lost... I forget to append | tee
after the running commands. However, I have uploaded the weights after each epoch here: https://nlp.cs.unc.edu/data/github_pretrain/lxmert/EpochXX_LXRT.pth, XX from 01 to 12
. The validation losses/accuracies could be calculated by calling lxmert_pretrain::LXMERT::evaluate_epoch.
The training loss for Epoch 11 is 3.5359
The losses are Mask_LM: 1.1908 Matched: 0.1880 Obj: 0.1911 Attr: 0.1214 Feat: 0.2147 QA: 1.6299
Overall Accu 0.3092, gqa Accu 0.3889, visual7w Accu 0.2171, vqa Accu 0.3828,
The valid loss is 4.5564
The losses are Mask_LM: 1.2261 Matched: 0.2304 Obj: 0.6772 Attr: 0.2478 Feat: 0.2092 QA: 1.9657
Overall Accu 0.2876, gqa Accu 0.3433, visual7w Accu 0.1969, vqa Accu 0.3498,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:23:10<00:00, 1.49s/it]
The training loss for Epoch 12 is 3.4354
The losses are Mask_LM: 1.1721 Matched: 0.1845 Obj: 0.1801 Attr: 0.1162 Feat: 0.2135 QA: 1.5689
Overall Accu 0.3148, gqa Accu 0.3965, visual7w Accu 0.2220, vqa Accu 0.3861,
The valid loss is 4.5573
The losses are Mask_LM: 1.2168 Matched: 0.2303 Obj: 0.6866 Attr: 0.2459 Feat: 0.2084 QA: 1.9693
Overall Accu 0.2912, gqa Accu 0.3531, visual7w Accu 0.1959, vqa Accu 0.3537,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:27:58<00:00, 1.50s/it]
The training loss for Epoch 13 is 3.3357
The losses are Mask_LM: 1.1552 Matched: 0.1816 Obj: 0.1696 Attr: 0.1113 Feat: 0.2124 QA: 1.5057
Overall Accu 0.3227, gqa Accu 0.4049, visual7w Accu 0.2283, vqa Accu 0.3970,
The valid loss is 4.5651
The losses are Mask_LM: 1.2140 Matched: 0.2301 Obj: 0.6980 Attr: 0.2469 Feat: 0.2072 QA: 1.9689
Overall Accu 0.2956, gqa Accu 0.3603, visual7w Accu 0.1970, vqa Accu 0.3599,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:30:56<00:00, 1.50s/it]
The training loss for Epoch 14 is 3.2383
The losses are Mask_LM: 1.1386 Matched: 0.1784 Obj: 0.1598 Attr: 0.1065 Feat: 0.2114 QA: 1.4436
Overall Accu 0.3296, gqa Accu 0.4136, visual7w Accu 0.2341, vqa Accu 0.4036,
The valid loss is 4.5303
The losses are Mask_LM: 1.1917 Matched: 0.2248 Obj: 0.7044 Attr: 0.2507 Feat: 0.2065 QA: 1.9522
Overall Accu 0.2913, gqa Accu 0.3502, visual7w Accu 0.2023, vqa Accu 0.3487,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:30:31<00:00, 1.50s/it]
The training loss for Epoch 15 is 3.1418
The losses are Mask_LM: 1.1240 Matched: 0.1756 Obj: 0.1500 Attr: 0.1018 Feat: 0.2104 QA: 1.3800
Overall Accu 0.3384, gqa Accu 0.4245, visual7w Accu 0.2410, vqa Accu 0.4129,
The valid loss is 4.5739
The losses are Mask_LM: 1.1880 Matched: 0.2245 Obj: 0.7120 Attr: 0.2539 Feat: 0.2058 QA: 1.9898
Overall Accu 0.2931, gqa Accu 0.3530, visual7w Accu 0.1995, vqa Accu 0.3551,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:32:03<00:00, 1.50s/it]
The training loss for Epoch 16 is 3.0503
The losses are Mask_LM: 1.1083 Matched: 0.1727 Obj: 0.1410 Attr: 0.0973 Feat: 0.2095 QA: 1.3215
Overall Accu 0.3466, gqa Accu 0.4330, visual7w Accu 0.2473, vqa Accu 0.4248,
The valid loss is 4.5356
The losses are Mask_LM: 1.1832 Matched: 0.2250 Obj: 0.7143 Attr: 0.2524 Feat: 0.2045 QA: 1.9562
Overall Accu 0.3006, gqa Accu 0.3605, visual7w Accu 0.1998, vqa Accu 0.3714,
100%|████████████████████████████████████████████████████████████████| 46742/46742 [19:24:57<00:00, 1.50s/it]
The training loss for Epoch 17 is 2.9553
The losses are Mask_LM: 1.0930 Matched: 0.1697 Obj: 0.1323 Attr: 0.0929 Feat: 0.2087 QA: 1.2587
Overall Accu 0.3558, gqa Accu 0.4454, visual7w Accu 0.2548, vqa Accu 0.4324,
The valid loss is 4.5562
The losses are Mask_LM: 1.1740 Matched: 0.2241 Obj: 0.7294 Attr: 0.2575 Feat: 0.2045 QA: 1.9668
Overall Accu 0.2980, gqa Accu 0.3561, visual7w Accu 0.2016, vqa Accu 0.3650,
94%|██████████████████████████████████████████████████████████▏ | 43899/46742 [18:11:36<1:11:55, 1.52s/it
Thanks!
The pretrained model you provided(at http://nlp.cs.unc.edu/data/model_LXRT.pth) was trained after 20 epoch or 12 epoch? Can the 12-epoch pretrained model achieve 79% accuracy on VQA dataset?
This default model is trained with two stages: 10 epochs without QA loss and 10 epochs with QA loss. The current default 20-epochs single-stage pre-training with QA loss would reach 69.86% on VQA, almost the same to the two-stage approach.
The results of 12-epochs single-stage pre-training on VQA is around 69.5, slightly lower than 20-epochs.
By the way, I am not sure whether 12-epochs pre-training with higher learning rate / small batch size (thus more updates) could get higher number. The hyper-parameters of the pre-training are currently under-tuned.
Was anyone able to reproduce the results by pretraining from scratch. Athough I can see better accuracy numbers on all three testsets while pretraining but My results are still 1 point lower on VQA, GQA and 3.3 points lower in NLVR2 while finetuning.
Hi @kritiagg, I tried to pretrain from scratch (12 epochs single-stage) and my result on VQA is 68.5. Did you manage to reproduce the results eventually?
I am running the reproducing experiment of lxmert pretraining. I wish that you can provide a log file, so I can find problems at early epochs.