Open ydhongHIT opened 1 month ago
The ema test accuracy of the last epoch in the first training is different from the ema test accuracy at the beginning of the second training.
Do you have any log?
[2024-07-27 10:18:11 vssm1_high_tiny_gradacc2] (main.py 440): INFO Full config saved to /usr/classification/vssm1_high_tiny_gradacc2/20240727101802/config.json [2024-07-27 10:18:11 vssm1_high_tiny_gradacc2] (main.py 443): INFO AMP_ENABLE: true AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 1.0 CUTMIX_MINMAX: null MIXUP: 0.8 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE:
[2024-07-27 10:18:11 vssm1_high_tiny_gradacc2] (main.py 444): INFO {"cfg": "configs/vssm/vmambav2_tiny_224_high.yaml", "opts": null, "batch_size": 112, "data_path": "/home/hyd/data/ImageNet/", "zip": false, "cache_mode": "part", "pretrained": null, "resume": "/usr/classification/vssm1_high_tiny_gradacc2/20240726160635/ckpt_epoch_29.pth", "accumulation_steps": 2, "use_checkpoint": false, "disable_amp": false, "output": "/usr/classification", "tag": "20240727101802", "eval": false, "throughput": false, "fused_layernorm": false, "optim": null, "model_ema": true, "model_ema_decay": 0.9999, "model_ema_force_cpu": false, "memory_limit_rate": -1}
[2024-07-27 10:18:14 vssm1_high_tiny_gradacc2] (main.py 121): INFO Creating model:vssm/vssm1_high_tiny_gradacc2
[2024-07-27 10:18:15 vssm1_high_tiny_gradacc2] (optimizer.py 18): INFO ==============> building optimizer adamw....................
[2024-07-27 10:18:15 vssm1_high_tiny_gradacc2] (main.py 178): INFO no checkpoint found in /usr/classification/vssm1_high_tiny_gradacc2/20240727101802, ignoring auto resume
[2024-07-27 10:18:15 vssm1_high_tiny_gradacc2] (utils.py 18): INFO ==============> Resuming form /usr/classification/vssm1_high_tiny_gradacc2/20240726160635/ckpt_epoch_29.pth....................
[2024-07-27 10:18:15 vssm1_high_tiny_gradacc2] (utils.py 27): INFO resuming model:
[2024-07-27 03:31:23 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2470/2859] eta 0:02:54 lr 0.000872 wd 0.0500 time 0.4278 (0.4490) data time 0.0008 (0.0131) model time 0.4271 (0.4359) loss 2.2814 (1.9959) grad_norm 2.0578 (inf) loss_scale 32768.0000 (34836.7204) mem 16138MB [2024-07-27 03:31:27 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2480/2859] eta 0:02:50 lr 0.000872 wd 0.0500 time 0.4289 (0.4490) data time 0.0006 (0.0131) model time 0.4282 (0.4359) loss 1.9998 (1.9961) grad_norm 2.7279 (inf) loss_scale 32768.0000 (34828.3821) mem 16138MB [2024-07-27 03:31:32 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2490/2859] eta 0:02:45 lr 0.000872 wd 0.0500 time 0.4265 (0.4489) data time 0.0007 (0.0130) model time 0.4258 (0.4359) loss 2.1105 (1.9958) grad_norm 1.6062 (inf) loss_scale 32768.0000 (34820.1108) mem 16138MB [2024-07-27 03:31:36 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2500/2859] eta 0:02:41 lr 0.000872 wd 0.0500 time 0.4280 (0.4489) data time 0.0008 (0.0130) model time 0.4272 (0.4359) loss 2.1443 (1.9960) grad_norm 2.3716 (inf) loss_scale 32768.0000 (34811.9056) mem 16138MB [2024-07-27 03:31:40 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2510/2859] eta 0:02:36 lr 0.000872 wd 0.0500 time 0.4288 (0.4488) data time 0.0008 (0.0129) model time 0.4280 (0.4359) loss 2.5049 (1.9953) grad_norm 1.9457 (inf) loss_scale 32768.0000 (34803.7658) mem 16138MB [2024-07-27 03:31:45 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2520/2859] eta 0:02:32 lr 0.000872 wd 0.0500 time 0.4283 (0.4488) data time 0.0007 (0.0129) model time 0.4276 (0.4359) loss 2.3857 (1.9954) grad_norm 2.2037 (inf) loss_scale 32768.0000 (34795.6906) mem 16138MB [2024-07-27 03:31:49 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2530/2859] eta 0:02:27 lr 0.000872 wd 0.0500 time 0.4266 (0.4487) data time 0.0008 (0.0128) model time 0.4258 (0.4359) loss 1.9954 (1.9962) grad_norm 1.9898 (inf) loss_scale 32768.0000 (34787.6792) mem 16138MB [2024-07-27 03:31:54 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2540/2859] eta 0:02:23 lr 0.000872 wd 0.0500 time 0.4266 (0.4487) data time 0.0006 (0.0128) model time 0.4260 (0.4359) loss 2.2547 (1.9963) grad_norm 2.9234 (inf) loss_scale 32768.0000 (34779.7308) mem 16138MB [2024-07-27 03:31:58 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2550/2859] eta 0:02:18 lr 0.000872 wd 0.0500 time 0.4263 (0.4487) data time 0.0008 (0.0127) model time 0.4255 (0.4359) loss 1.6255 (1.9959) grad_norm 2.0983 (inf) loss_scale 32768.0000 (34771.8448) mem 16138MB [2024-07-27 03:32:02 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2560/2859] eta 0:02:14 lr 0.000872 wd 0.0500 time 0.4264 (0.4486) data time 0.0008 (0.0127) model time 0.4256 (0.4359) loss 1.9472 (1.9962) grad_norm 2.4020 (inf) loss_scale 32768.0000 (34764.0203) mem 16138MB [2024-07-27 03:32:07 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2570/2859] eta 0:02:09 lr 0.000872 wd 0.0500 time 0.4272 (0.4486) data time 0.0007 (0.0126) model time 0.4266 (0.4359) loss 2.0075 (1.9958) grad_norm 3.1504 (inf) loss_scale 32768.0000 (34756.2567) mem 16138MB [2024-07-27 03:32:11 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2580/2859] eta 0:02:05 lr 0.000872 wd 0.0500 time 0.4280 (0.4485) data time 0.0008 (0.0126) model time 0.4272 (0.4359) loss 2.1553 (1.9962) grad_norm 2.5283 (inf) loss_scale 32768.0000 (34748.5533) mem 16138MB [2024-07-27 03:32:15 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2590/2859] eta 0:02:00 lr 0.000872 wd 0.0500 time 0.4268 (0.4485) data time 0.0009 (0.0125) model time 0.4259 (0.4359) loss 1.4707 (1.9957) grad_norm 3.0709 (inf) loss_scale 32768.0000 (34740.9093) mem 16138MB [2024-07-27 03:32:20 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2600/2859] eta 0:01:56 lr 0.000872 wd 0.0500 time 0.4269 (0.4484) data time 0.0007 (0.0125) model time 0.4263 (0.4359) loss 2.3621 (1.9957) grad_norm 1.5164 (inf) loss_scale 32768.0000 (34733.3241) mem 16138MB [2024-07-27 03:32:24 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2610/2859] eta 0:01:51 lr 0.000872 wd 0.0500 time 0.4265 (0.4484) data time 0.0007 (0.0125) model time 0.4258 (0.4359) loss 2.0586 (1.9955) grad_norm 4.5267 (inf) loss_scale 32768.0000 (34725.7970) mem 16138MB [2024-07-27 03:32:28 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2620/2859] eta 0:01:47 lr 0.000872 wd 0.0500 time 0.4277 (0.4483) data time 0.0007 (0.0124) model time 0.4270 (0.4359) loss 2.2128 (1.9959) grad_norm 3.8236 (inf) loss_scale 32768.0000 (34718.3274) mem 16138MB [2024-07-27 03:32:33 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2630/2859] eta 0:01:42 lr 0.000872 wd 0.0500 time 0.4266 (0.4483) data time 0.0008 (0.0124) model time 0.4258 (0.4359) loss 2.3189 (1.9959) grad_norm 1.8963 (inf) loss_scale 32768.0000 (34710.9145) mem 16138MB [2024-07-27 03:32:37 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2640/2859] eta 0:01:38 lr 0.000872 wd 0.0500 time 0.4282 (0.4482) data time 0.0009 (0.0123) model time 0.4273 (0.4359) loss 2.3030 (1.9958) grad_norm 3.3634 (inf) loss_scale 32768.0000 (34703.5577) mem 16138MB [2024-07-27 03:32:42 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2650/2859] eta 0:01:33 lr 0.000872 wd 0.0500 time 0.4261 (0.4482) data time 0.0007 (0.0123) model time 0.4254 (0.4359) loss 2.2200 (1.9965) grad_norm 4.9659 (inf) loss_scale 32768.0000 (34696.2565) mem 16138MB [2024-07-27 03:32:46 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2660/2859] eta 0:01:29 lr 0.000872 wd 0.0500 time 0.4283 (0.4482) data time 0.0007 (0.0122) model time 0.4276 (0.4359) loss 2.2473 (1.9971) grad_norm 2.1202 (inf) loss_scale 32768.0000 (34689.0101) mem 16138MB [2024-07-27 03:32:50 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2670/2859] eta 0:01:24 lr 0.000872 wd 0.0500 time 0.4283 (0.4481) data time 0.0008 (0.0122) model time 0.4275 (0.4359) loss 2.0148 (1.9972) grad_norm 4.1605 (inf) loss_scale 32768.0000 (34681.8180) mem 16138MB [2024-07-27 03:32:55 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2680/2859] eta 0:01:20 lr 0.000872 wd 0.0500 time 0.4272 (0.4481) data time 0.0015 (0.0122) model time 0.4256 (0.4359) loss 2.1019 (1.9973) grad_norm 2.5604 (inf) loss_scale 32768.0000 (34674.6796) mem 16138MB [2024-07-27 03:32:59 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2690/2859] eta 0:01:15 lr 0.000872 wd 0.0500 time 0.4282 (0.4480) data time 0.0008 (0.0121) model time 0.4274 (0.4359) loss 2.1770 (1.9975) grad_norm 2.4649 (inf) loss_scale 32768.0000 (34667.5942) mem 16138MB [2024-07-27 03:33:03 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2700/2859] eta 0:01:11 lr 0.000872 wd 0.0500 time 0.4281 (0.4480) data time 0.0008 (0.0121) model time 0.4273 (0.4359) loss 2.2774 (1.9971) grad_norm 1.9458 (inf) loss_scale 32768.0000 (34660.5613) mem 16138MB [2024-07-27 03:33:08 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2710/2859] eta 0:01:06 lr 0.000872 wd 0.0500 time 0.4279 (0.4479) data time 0.0007 (0.0120) model time 0.4272 (0.4359) loss 2.1965 (1.9969) grad_norm 1.4778 (inf) loss_scale 32768.0000 (34653.5802) mem 16138MB [2024-07-27 03:33:12 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2720/2859] eta 0:01:02 lr 0.000872 wd 0.0500 time 0.4266 (0.4479) data time 0.0008 (0.0120) model time 0.4258 (0.4359) loss 2.1484 (1.9970) grad_norm 1.9913 (inf) loss_scale 32768.0000 (34646.6505) mem 16138MB [2024-07-27 03:33:16 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2730/2859] eta 0:00:57 lr 0.000872 wd 0.0500 time 0.4249 (0.4479) data time 0.0008 (0.0120) model time 0.4242 (0.4359) loss 2.3283 (1.9969) grad_norm 1.6794 (inf) loss_scale 32768.0000 (34639.7715) mem 16138MB [2024-07-27 03:33:21 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2740/2859] eta 0:00:53 lr 0.000872 wd 0.0500 time 0.4240 (0.4478) data time 0.0008 (0.0119) model time 0.4233 (0.4359) loss 2.2862 (1.9970) grad_norm 1.7257 (inf) loss_scale 32768.0000 (34632.9427) mem 16138MB [2024-07-27 03:33:25 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2750/2859] eta 0:00:48 lr 0.000872 wd 0.0500 time 0.4231 (0.4478) data time 0.0006 (0.0119) model time 0.4225 (0.4359) loss 1.5106 (1.9970) grad_norm 2.8539 (inf) loss_scale 32768.0000 (34626.1636) mem 16138MB [2024-07-27 03:33:30 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2760/2859] eta 0:00:44 lr 0.000872 wd 0.0500 time 0.4239 (0.4477) data time 0.0008 (0.0118) model time 0.4231 (0.4359) loss 1.8417 (1.9966) grad_norm 4.5991 (inf) loss_scale 32768.0000 (34619.4335) mem 16138MB [2024-07-27 03:33:34 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2770/2859] eta 0:00:39 lr 0.000872 wd 0.0500 time 0.4238 (0.4477) data time 0.0007 (0.0118) model time 0.4231 (0.4359) loss 1.5233 (1.9963) grad_norm 4.2230 (inf) loss_scale 32768.0000 (34612.7521) mem 16138MB [2024-07-27 03:33:38 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2780/2859] eta 0:00:35 lr 0.000872 wd 0.0500 time 0.4236 (0.4477) data time 0.0007 (0.0118) model time 0.4229 (0.4359) loss 2.3142 (1.9965) grad_norm 3.2259 (inf) loss_scale 32768.0000 (34606.1187) mem 16138MB [2024-07-27 03:33:43 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2790/2859] eta 0:00:30 lr 0.000872 wd 0.0500 time 0.4271 (0.4476) data time 0.0007 (0.0117) model time 0.4264 (0.4359) loss 2.1459 (1.9968) grad_norm 2.4108 (inf) loss_scale 32768.0000 (34599.5328) mem 16138MB [2024-07-27 03:33:47 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2800/2859] eta 0:00:26 lr 0.000872 wd 0.0500 time 0.4294 (0.4476) data time 0.0007 (0.0117) model time 0.4286 (0.4359) loss 1.7322 (1.9964) grad_norm 1.8060 (inf) loss_scale 32768.0000 (34592.9939) mem 16138MB [2024-07-27 03:33:51 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2810/2859] eta 0:00:21 lr 0.000872 wd 0.0500 time 0.4269 (0.4475) data time 0.0008 (0.0116) model time 0.4261 (0.4359) loss 1.9938 (1.9969) grad_norm 4.6479 (inf) loss_scale 32768.0000 (34586.5016) mem 16138MB [2024-07-27 03:33:56 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2820/2859] eta 0:00:17 lr 0.000872 wd 0.0500 time 0.4282 (0.4475) data time 0.0007 (0.0116) model time 0.4275 (0.4359) loss 1.4698 (1.9970) grad_norm 1.8755 (inf) loss_scale 32768.0000 (34580.0553) mem 16138MB [2024-07-27 03:34:00 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2830/2859] eta 0:00:12 lr 0.000872 wd 0.0500 time 0.4284 (0.4475) data time 0.0006 (0.0116) model time 0.4278 (0.4359) loss 1.7624 (1.9969) grad_norm 3.7602 (inf) loss_scale 32768.0000 (34573.6545) mem 16138MB [2024-07-27 03:34:05 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2840/2859] eta 0:00:08 lr 0.000872 wd 0.0500 time 0.4287 (0.4474) data time 0.0007 (0.0115) model time 0.4280 (0.4359) loss 2.1205 (1.9972) grad_norm 3.1047 (inf) loss_scale 32768.0000 (34567.2988) mem 16138MB [2024-07-27 03:34:09 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2850/2859] eta 0:00:04 lr 0.000872 wd 0.0500 time 0.4269 (0.4474) data time 0.0004 (0.0115) model time 0.4265 (0.4359) loss 2.1876 (1.9977) grad_norm 1.9521 (inf) loss_scale 32768.0000 (34560.9877) mem 16138MB [2024-07-27 03:34:14 vssm1_high_tiny_gradacc2] (main.py 306): INFO EPOCH 29 training takes 0:21:20 [2024-07-27 03:34:14 vssm1_high_tiny_gradacc2] (utils.py 99): INFO /usr/classification/vssm1_high_tiny_gradacc2/20240726160635/ckpt_epoch_29.pth saving...... [2024-07-27 03:34:14 vssm1_high_tiny_gradacc2] (utils.py 101): INFO /usr/classification/vssm1_high_tiny_gradacc2/20240726160635/ckpt_epoch_29.pth saved !!! [2024-07-27 03:34:37 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [0/112] Time 22.606 (22.606) Loss 1.0176 (1.0176) Acc@1 79.018 (79.018) Acc@5 92.857 (92.857) Mem 16138MB [2024-07-27 03:34:38 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [10/112] Time 0.139 (2.182) Loss 0.5269 (0.9903) Acc@1 90.625 (77.638) Acc@5 95.982 (93.628) Mem 16138MB [2024-07-27 03:34:39 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [20/112] Time 0.139 (1.209) Loss 1.3096 (1.0319) Acc@1 67.634 (76.701) Acc@5 93.304 (93.399) Mem 16138MB [2024-07-27 03:34:41 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [30/112] Time 0.139 (0.864) Loss 1.0557 (1.0591) Acc@1 76.563 (75.072) Acc@5 93.304 (93.548) Mem 16138MB [2024-07-27 03:34:42 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [40/112] Time 0.139 (0.687) Loss 0.8892 (1.0369) Acc@1 80.134 (75.626) Acc@5 94.643 (93.881) Mem 16138MB [2024-07-27 03:34:44 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [50/112] Time 0.139 (0.580) Loss 1.5205 (1.1105) Acc@1 68.080 (74.090) Acc@5 86.607 (92.918) Mem 16138MB [2024-07-27 03:34:45 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [60/112] Time 0.139 (0.507) Loss 1.6621 (1.2098) Acc@1 59.598 (71.981) Acc@5 83.705 (91.522) Mem 16138MB [2024-07-27 03:34:46 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [70/112] Time 0.139 (0.456) Loss 1.8613 (1.2623) Acc@1 59.821 (71.004) Acc@5 81.920 (90.660) Mem 16138MB [2024-07-27 03:34:48 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [80/112] Time 0.139 (0.417) Loss 1.3174 (1.3078) Acc@1 67.857 (69.899) Acc@5 88.170 (90.090) Mem 16138MB [2024-07-27 03:34:49 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [90/112] Time 0.139 (0.386) Loss 1.8135 (1.3504) Acc@1 55.134 (69.223) Acc@5 85.045 (89.453) Mem 16138MB [2024-07-27 03:34:51 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [100/112] Time 0.139 (0.362) Loss 1.7207 (1.3857) Acc@1 65.402 (68.487) Acc@5 82.143 (88.886) Mem 16138MB [2024-07-27 03:34:52 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [110/112] Time 0.139 (0.342) Loss 0.5435 (1.3779) Acc@1 87.500 (68.627) Acc@5 97.098 (89.058) Mem 16138MB [2024-07-27 03:34:53 vssm1_high_tiny_gradacc2] (main.py 353): INFO Acc@1 68.626 Acc@5 89.066 [2024-07-27 03:34:53 vssm1_high_tiny_gradacc2] (main.py 222): INFO Accuracy of the network on the 50000 test images: 68.6% [2024-07-27 03:34:53 vssm1_high_tiny_gradacc2] (main.py 224): INFO Max accuracy: 68.63% [2024-07-27 03:35:16 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [0/112] Time 23.057 (23.057) Loss 1.0684 (1.0684) Acc@1 75.223 (75.223) Acc@5 94.643 (94.643) Mem 16138MB [2024-07-27 03:35:18 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [10/112] Time 0.138 (2.222) Loss 0.7012 (1.1181) Acc@1 89.062 (75.832) Acc@5 95.982 (93.060) Mem 16138MB [2024-07-27 03:35:19 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [20/112] Time 0.138 (1.230) Loss 1.5107 (1.1588) Acc@1 59.821 (74.809) Acc@5 91.295 (92.623) Mem 16138MB [2024-07-27 03:35:20 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [30/112] Time 0.139 (0.878) Loss 1.1465 (1.1626) Acc@1 76.562 (74.150) Acc@5 94.866 (93.059) Mem 16138MB [2024-07-27 03:35:22 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [40/112] Time 0.138 (0.697) Loss 1.0566 (1.1352) Acc@1 79.911 (74.848) Acc@5 93.080 (93.429) Mem 16138MB [2024-07-27 03:35:23 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [50/112] Time 0.139 (0.588) Loss 1.6562 (1.2065) Acc@1 63.839 (73.210) Acc@5 84.821 (92.314) Mem 16138MB [2024-07-27 03:35:24 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [60/112] Time 0.138 (0.514) Loss 1.7354 (1.3222) Acc@1 57.366 (70.693) Acc@5 81.920 (90.640) Mem 16138MB [2024-07-27 03:35:26 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [70/112] Time 0.138 (0.464) Loss 1.8867 (1.3710) Acc@1 62.054 (69.841) Acc@5 80.804 (89.798) Mem 16138MB [2024-07-27 03:35:27 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [80/112] Time 0.140 (0.424) Loss 1.5879 (1.4187) Acc@1 65.848 (68.684) Acc@5 86.384 (89.173) Mem 16138MB [2024-07-27 03:35:29 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [90/112] Time 0.139 (0.393) Loss 2.0332 (1.4653) Acc@1 54.464 (67.874) Acc@5 80.804 (88.430) Mem 16138MB [2024-07-27 03:35:30 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [100/112] Time 0.139 (0.368) Loss 1.8418 (1.5056) Acc@1 61.607 (67.006) Acc@5 83.259 (87.860) Mem 16138MB [2024-07-27 03:35:32 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [110/112] Time 0.139 (0.347) Loss 0.6914 (1.4968) Acc@1 85.714 (67.117) Acc@5 96.205 (88.049) Mem 16138MB [2024-07-27 03:35:33 vssm1_high_tiny_gradacc2] (main.py 353): INFO Acc@1 67.120 Acc@5 88.058 [2024-07-27 03:35:33 vssm1_high_tiny_gradacc2] (main.py 227): INFO Accuracy of the network on the 50000 test images: 67.1% [2024-07-27 03:35:33 vssm1_high_tiny_gradacc2] (main.py 229): INFO Max accuracy ema: 67.12% [2024-07-27 03:36:04 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][0/2859] eta 1 day, 0:25:17 lr 0.000872 wd 0.0500 time 30.7511 (30.7511) data time 30.2318 (30.2318) model time 0.0000 (0.0000) loss 2.1571 (2.1571) grad_norm 0.0000 (0.0000) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:08 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][10/2859] eta 2:31:10 lr 0.000872 wd 0.0500 time 0.4175 (3.1838) data time 0.0007 (2.7493) model time 0.0000 (0.0000) loss 1.6635 (1.9808) grad_norm 2.3338 (2.0338) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:12 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][20/2859] eta 1:28:53 lr 0.000872 wd 0.0500 time 0.4177 (1.8785) data time 0.0007 (1.4406) model time 0.0000 (0.0000) loss 1.9455 (1.9647) grad_norm 2.2848 (2.1245) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:16 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][30/2859] eta 1:06:30 lr 0.000872 wd 0.0500 time 0.4183 (1.4104) data time 0.0006 (0.9762) model time 0.0000 (0.0000) loss 1.5447 (1.9503) grad_norm 1.9447 (2.1679) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:21 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][40/2859] eta 0:55:00 lr 0.000872 wd 0.0500 time 0.4194 (1.1708) data time 0.0007 (0.7383) model time 0.0000 (0.0000) loss 1.8946 (1.9332) grad_norm 2.0142 (2.2084) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:25 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][50/2859] eta 0:48:01 lr 0.000872 wd 0.0500 time 0.4157 (1.0258) data time 0.0008 (0.5938) model time 0.0000 (0.0000) loss 1.7480 (1.9751) grad_norm 4.0086 (2.3248) loss_scale 32768.0000 (32768.0000) mem 16138MB
Hi, the first is the resume log and the second is the initial log. Resume from epoch 29. It can be seen that test accuracy without ema is 68.626 while the test accuracy with ema is different (67.12 v.s. 67.242).
I encountered this problem more than once. The difference is small, but it does exist.
When resuming from the last checkpoint, the ema test results are different but the normal test results are the same. Is this a normal phenomenon?