MzeroMiko / VMamba

VMamba: Visual State Space Models,code is based on mamba
MIT License
2.02k stars 118 forks source link

EMA resume #273

Open ydhongHIT opened 1 month ago

ydhongHIT commented 1 month ago

When resuming from the last checkpoint, the ema test results are different but the normal test results are the same. Is this a normal phenomenon?

ydhongHIT commented 1 month ago

The ema test accuracy of the last epoch in the first training is different from the ema test accuracy at the beginning of the second training.

MzeroMiko commented 1 month ago

Do you have any log?

ydhongHIT commented 1 month ago

[2024-07-27 10:18:11 vssm1_high_tiny_gradacc2] (main.py 440): INFO Full config saved to /usr/classification/vssm1_high_tiny_gradacc2/20240727101802/config.json [2024-07-27 10:18:11 vssm1_high_tiny_gradacc2] (main.py 443): INFO AMP_ENABLE: true AMP_OPT_LEVEL: '' AUG: AUTO_AUGMENT: rand-m9-mstd0.5-inc1 COLOR_JITTER: 0.4 CUTMIX: 1.0 CUTMIX_MINMAX: null MIXUP: 0.8 MIXUP_MODE: batch MIXUP_PROB: 1.0 MIXUP_SWITCH_PROB: 0.5 RECOUNT: 1 REMODE: pixel REPROB: 0.25 BASE:

[2024-07-27 10:18:11 vssm1_high_tiny_gradacc2] (main.py 444): INFO {"cfg": "configs/vssm/vmambav2_tiny_224_high.yaml", "opts": null, "batch_size": 112, "data_path": "/home/hyd/data/ImageNet/", "zip": false, "cache_mode": "part", "pretrained": null, "resume": "/usr/classification/vssm1_high_tiny_gradacc2/20240726160635/ckpt_epoch_29.pth", "accumulation_steps": 2, "use_checkpoint": false, "disable_amp": false, "output": "/usr/classification", "tag": "20240727101802", "eval": false, "throughput": false, "fused_layernorm": false, "optim": null, "model_ema": true, "model_ema_decay": 0.9999, "model_ema_force_cpu": false, "memory_limit_rate": -1} [2024-07-27 10:18:14 vssm1_high_tiny_gradacc2] (main.py 121): INFO Creating model:vssm/vssm1_high_tiny_gradacc2 [2024-07-27 10:18:15 vssm1_high_tiny_gradacc2] (optimizer.py 18): INFO ==============> building optimizer adamw.................... [2024-07-27 10:18:15 vssm1_high_tiny_gradacc2] (main.py 178): INFO no checkpoint found in /usr/classification/vssm1_high_tiny_gradacc2/20240727101802, ignoring auto resume [2024-07-27 10:18:15 vssm1_high_tiny_gradacc2] (utils.py 18): INFO ==============> Resuming form /usr/classification/vssm1_high_tiny_gradacc2/20240726160635/ckpt_epoch_29.pth.................... [2024-07-27 10:18:15 vssm1_high_tiny_gradacc2] (utils.py 27): INFO resuming model: [2024-07-27 10:18:16 vssm1_high_tiny_gradacc2] (utils.py 34): INFO resuming model_ema: [2024-07-27 10:18:16 vssm1_high_tiny_gradacc2] (utils.py 48): INFO => loaded successfully '/usr/classification/vssm1_high_tiny_gradacc2/20240726160635/ckpt_epoch_29.pth' (epoch 29) [2024-07-27 10:18:40 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [0/112] Time 23.992 (23.992) Loss 1.0176 (1.0176) Acc@1 79.018 (79.018) Acc@5 92.857 (92.857) Mem 1952MB [2024-07-27 10:18:41 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [10/112] Time 0.137 (2.306) Loss 0.5269 (0.9903) Acc@1 90.625 (77.638) Acc@5 95.982 (93.628) Mem 1953MB [2024-07-27 10:18:42 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [20/112] Time 0.137 (1.273) Loss 1.3096 (1.0319) Acc@1 67.634 (76.690) Acc@5 93.304 (93.399) Mem 1953MB [2024-07-27 10:18:44 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [30/112] Time 0.138 (0.907) Loss 1.0557 (1.0591) Acc@1 76.563 (75.065) Acc@5 93.304 (93.548) Mem 1953MB [2024-07-27 10:18:45 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [40/112] Time 0.138 (0.719) Loss 0.8892 (1.0369) Acc@1 80.134 (75.621) Acc@5 94.643 (93.881) Mem 1953MB [2024-07-27 10:18:47 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [50/112] Time 0.138 (0.605) Loss 1.5205 (1.1105) Acc@1 68.080 (74.085) Acc@5 86.607 (92.918) Mem 1953MB [2024-07-27 10:18:48 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [60/112] Time 0.138 (0.529) Loss 1.6621 (1.2099) Acc@1 59.598 (71.977) Acc@5 83.705 (91.522) Mem 1953MB [2024-07-27 10:18:49 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [70/112] Time 0.138 (0.474) Loss 1.8613 (1.2623) Acc@1 59.821 (71.001) Acc@5 81.920 (90.660) Mem 1953MB [2024-07-27 10:18:51 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [80/112] Time 0.138 (0.432) Loss 1.3174 (1.3078) Acc@1 67.857 (69.899) Acc@5 88.170 (90.090) Mem 1953MB [2024-07-27 10:18:52 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [90/112] Time 0.138 (0.400) Loss 1.8135 (1.3504) Acc@1 55.134 (69.223) Acc@5 85.045 (89.453) Mem 1953MB [2024-07-27 10:18:53 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [100/112] Time 0.138 (0.374) Loss 1.7207 (1.3857) Acc@1 65.402 (68.487) Acc@5 82.143 (88.886) Mem 1953MB [2024-07-27 10:18:55 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [110/112] Time 0.138 (0.353) Loss 0.5435 (1.3779) Acc@1 87.500 (68.627) Acc@5 97.098 (89.058) Mem 1953MB [2024-07-27 10:18:56 vssm1_high_tiny_gradacc2] (main.py 353): INFO Acc@1 68.626 Acc@5 89.066 [2024-07-27 10:18:56 vssm1_high_tiny_gradacc2] (main.py 183): INFO Accuracy of the network on the 50000 test images: 68.6% [2024-07-27 10:19:18 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [0/112] Time 21.982 (21.982) Loss 1.0635 (1.0635) Acc@1 75.000 (75.000) Acc@5 94.643 (94.643) Mem 1953MB [2024-07-27 10:19:20 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [10/112] Time 0.137 (2.134) Loss 0.6890 (1.1091) Acc@1 89.062 (76.157) Acc@5 95.982 (93.101) Mem 1953MB [2024-07-27 10:19:21 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [20/112] Time 0.138 (1.183) Loss 1.5088 (1.1491) Acc@1 59.152 (74.915) Acc@5 91.071 (92.687) Mem 1953MB [2024-07-27 10:19:22 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [30/112] Time 0.138 (0.846) Loss 1.1348 (1.1509) Acc@1 76.562 (74.215) Acc@5 94.420 (93.109) Mem 1953MB [2024-07-27 10:19:24 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [40/112] Time 0.138 (0.674) Loss 1.0391 (1.1234) Acc@1 79.688 (74.842) Acc@5 93.304 (93.472) Mem 1953MB [2024-07-27 10:19:25 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [50/112] Time 0.137 (0.569) Loss 1.6367 (1.1939) Acc@1 64.062 (73.245) Acc@5 85.045 (92.358) Mem 1953MB [2024-07-27 10:19:27 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [60/112] Time 0.138 (0.498) Loss 1.7275 (1.3092) Acc@1 57.143 (70.733) Acc@5 81.027 (90.662) Mem 1953MB [2024-07-27 10:19:28 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [70/112] Time 0.137 (0.453) Loss 1.8652 (1.3570) Acc@1 62.500 (69.913) Acc@5 81.027 (89.845) Mem 1953MB [2024-07-27 10:19:30 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [80/112] Time 0.139 (0.414) Loss 1.5752 (1.4048) Acc@1 65.848 (68.791) Acc@5 85.938 (89.222) Mem 1953MB [2024-07-27 10:19:31 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [90/112] Time 0.138 (0.383) Loss 2.0156 (1.4514) Acc@1 54.464 (67.980) Acc@5 81.250 (88.479) Mem 1953MB [2024-07-27 10:19:33 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [100/112] Time 0.137 (0.359) Loss 1.8301 (1.4913) Acc@1 62.500 (67.130) Acc@5 83.036 (87.904) Mem 1953MB [2024-07-27 10:19:34 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [110/112] Time 0.138 (0.339) Loss 0.6787 (1.4825) Acc@1 86.161 (67.238) Acc@5 96.205 (88.085) Mem 1953MB [2024-07-27 10:19:35 vssm1_high_tiny_gradacc2] (main.py 353): INFO Acc@1 67.242 Acc@5 88.094 [2024-07-27 10:19:35 vssm1_high_tiny_gradacc2] (main.py 186): INFO Accuracy of the network ema on the 50000 test images: 67.2% [2024-07-27 10:19:35 vssm1_high_tiny_gradacc2] (main.py 212): INFO Start training [2024-07-27 10:20:06 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][0/2859] eta 1 day, 0:47:26 lr 0.000872 wd 0.0500 time 31.2161 (31.2161) data time 29.6682 (29.6682) model time 0.0000 (0.0000) loss 2.3442 (2.3442) grad_norm 0.0000 (0.0000) loss_scale 32768.0000 (32768.0000) mem 16145MB [2024-07-27 10:20:11 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][10/2859] eta 2:33:31 lr 0.000872 wd 0.0500 time 0.4179 (3.2332) data time 0.0008 (2.6983) model time 0.0000 (0.0000) loss 1.7202 (2.1515) grad_norm 2.0512 (2.3141) loss_scale 32768.0000 (32768.0000) mem 16145MB [2024-07-27 10:20:15 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][20/2859] eta 1:29:44 lr 0.000872 wd 0.0500 time 0.4156 (1.8966) data time 0.0010 (1.4139) model time 0.0000 (0.0000) loss 1.9401 (2.0966) grad_norm 2.0685 (2.3332) loss_scale 32768.0000 (32768.0000) mem 16145MB [2024-07-27 10:20:19 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][30/2859] eta 1:07:05 lr 0.000872 wd 0.0500 time 0.4174 (1.4229) data time 0.0006 (0.9582) model time 0.0000 (0.0000) loss 1.5812 (2.1116) grad_norm 2.2284 (2.2566) loss_scale 32768.0000 (32768.0000) mem 16145MB [2024-07-27 10:20:23 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][40/2859] eta 0:55:26 lr 0.000872 wd 0.0500 time 0.4226 (1.1802) data time 0.0009 (0.7247) model time 0.0000 (0.0000) loss 2.0292 (2.0791) grad_norm 2.2113 (2.2803) loss_scale 32768.0000 (32768.0000) mem 16145MB

ydhongHIT commented 1 month ago

[2024-07-27 03:31:23 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2470/2859] eta 0:02:54 lr 0.000872 wd 0.0500 time 0.4278 (0.4490) data time 0.0008 (0.0131) model time 0.4271 (0.4359) loss 2.2814 (1.9959) grad_norm 2.0578 (inf) loss_scale 32768.0000 (34836.7204) mem 16138MB [2024-07-27 03:31:27 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2480/2859] eta 0:02:50 lr 0.000872 wd 0.0500 time 0.4289 (0.4490) data time 0.0006 (0.0131) model time 0.4282 (0.4359) loss 1.9998 (1.9961) grad_norm 2.7279 (inf) loss_scale 32768.0000 (34828.3821) mem 16138MB [2024-07-27 03:31:32 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2490/2859] eta 0:02:45 lr 0.000872 wd 0.0500 time 0.4265 (0.4489) data time 0.0007 (0.0130) model time 0.4258 (0.4359) loss 2.1105 (1.9958) grad_norm 1.6062 (inf) loss_scale 32768.0000 (34820.1108) mem 16138MB [2024-07-27 03:31:36 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2500/2859] eta 0:02:41 lr 0.000872 wd 0.0500 time 0.4280 (0.4489) data time 0.0008 (0.0130) model time 0.4272 (0.4359) loss 2.1443 (1.9960) grad_norm 2.3716 (inf) loss_scale 32768.0000 (34811.9056) mem 16138MB [2024-07-27 03:31:40 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2510/2859] eta 0:02:36 lr 0.000872 wd 0.0500 time 0.4288 (0.4488) data time 0.0008 (0.0129) model time 0.4280 (0.4359) loss 2.5049 (1.9953) grad_norm 1.9457 (inf) loss_scale 32768.0000 (34803.7658) mem 16138MB [2024-07-27 03:31:45 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2520/2859] eta 0:02:32 lr 0.000872 wd 0.0500 time 0.4283 (0.4488) data time 0.0007 (0.0129) model time 0.4276 (0.4359) loss 2.3857 (1.9954) grad_norm 2.2037 (inf) loss_scale 32768.0000 (34795.6906) mem 16138MB [2024-07-27 03:31:49 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2530/2859] eta 0:02:27 lr 0.000872 wd 0.0500 time 0.4266 (0.4487) data time 0.0008 (0.0128) model time 0.4258 (0.4359) loss 1.9954 (1.9962) grad_norm 1.9898 (inf) loss_scale 32768.0000 (34787.6792) mem 16138MB [2024-07-27 03:31:54 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2540/2859] eta 0:02:23 lr 0.000872 wd 0.0500 time 0.4266 (0.4487) data time 0.0006 (0.0128) model time 0.4260 (0.4359) loss 2.2547 (1.9963) grad_norm 2.9234 (inf) loss_scale 32768.0000 (34779.7308) mem 16138MB [2024-07-27 03:31:58 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2550/2859] eta 0:02:18 lr 0.000872 wd 0.0500 time 0.4263 (0.4487) data time 0.0008 (0.0127) model time 0.4255 (0.4359) loss 1.6255 (1.9959) grad_norm 2.0983 (inf) loss_scale 32768.0000 (34771.8448) mem 16138MB [2024-07-27 03:32:02 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2560/2859] eta 0:02:14 lr 0.000872 wd 0.0500 time 0.4264 (0.4486) data time 0.0008 (0.0127) model time 0.4256 (0.4359) loss 1.9472 (1.9962) grad_norm 2.4020 (inf) loss_scale 32768.0000 (34764.0203) mem 16138MB [2024-07-27 03:32:07 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2570/2859] eta 0:02:09 lr 0.000872 wd 0.0500 time 0.4272 (0.4486) data time 0.0007 (0.0126) model time 0.4266 (0.4359) loss 2.0075 (1.9958) grad_norm 3.1504 (inf) loss_scale 32768.0000 (34756.2567) mem 16138MB [2024-07-27 03:32:11 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2580/2859] eta 0:02:05 lr 0.000872 wd 0.0500 time 0.4280 (0.4485) data time 0.0008 (0.0126) model time 0.4272 (0.4359) loss 2.1553 (1.9962) grad_norm 2.5283 (inf) loss_scale 32768.0000 (34748.5533) mem 16138MB [2024-07-27 03:32:15 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2590/2859] eta 0:02:00 lr 0.000872 wd 0.0500 time 0.4268 (0.4485) data time 0.0009 (0.0125) model time 0.4259 (0.4359) loss 1.4707 (1.9957) grad_norm 3.0709 (inf) loss_scale 32768.0000 (34740.9093) mem 16138MB [2024-07-27 03:32:20 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2600/2859] eta 0:01:56 lr 0.000872 wd 0.0500 time 0.4269 (0.4484) data time 0.0007 (0.0125) model time 0.4263 (0.4359) loss 2.3621 (1.9957) grad_norm 1.5164 (inf) loss_scale 32768.0000 (34733.3241) mem 16138MB [2024-07-27 03:32:24 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2610/2859] eta 0:01:51 lr 0.000872 wd 0.0500 time 0.4265 (0.4484) data time 0.0007 (0.0125) model time 0.4258 (0.4359) loss 2.0586 (1.9955) grad_norm 4.5267 (inf) loss_scale 32768.0000 (34725.7970) mem 16138MB [2024-07-27 03:32:28 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2620/2859] eta 0:01:47 lr 0.000872 wd 0.0500 time 0.4277 (0.4483) data time 0.0007 (0.0124) model time 0.4270 (0.4359) loss 2.2128 (1.9959) grad_norm 3.8236 (inf) loss_scale 32768.0000 (34718.3274) mem 16138MB [2024-07-27 03:32:33 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2630/2859] eta 0:01:42 lr 0.000872 wd 0.0500 time 0.4266 (0.4483) data time 0.0008 (0.0124) model time 0.4258 (0.4359) loss 2.3189 (1.9959) grad_norm 1.8963 (inf) loss_scale 32768.0000 (34710.9145) mem 16138MB [2024-07-27 03:32:37 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2640/2859] eta 0:01:38 lr 0.000872 wd 0.0500 time 0.4282 (0.4482) data time 0.0009 (0.0123) model time 0.4273 (0.4359) loss 2.3030 (1.9958) grad_norm 3.3634 (inf) loss_scale 32768.0000 (34703.5577) mem 16138MB [2024-07-27 03:32:42 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2650/2859] eta 0:01:33 lr 0.000872 wd 0.0500 time 0.4261 (0.4482) data time 0.0007 (0.0123) model time 0.4254 (0.4359) loss 2.2200 (1.9965) grad_norm 4.9659 (inf) loss_scale 32768.0000 (34696.2565) mem 16138MB [2024-07-27 03:32:46 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2660/2859] eta 0:01:29 lr 0.000872 wd 0.0500 time 0.4283 (0.4482) data time 0.0007 (0.0122) model time 0.4276 (0.4359) loss 2.2473 (1.9971) grad_norm 2.1202 (inf) loss_scale 32768.0000 (34689.0101) mem 16138MB [2024-07-27 03:32:50 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2670/2859] eta 0:01:24 lr 0.000872 wd 0.0500 time 0.4283 (0.4481) data time 0.0008 (0.0122) model time 0.4275 (0.4359) loss 2.0148 (1.9972) grad_norm 4.1605 (inf) loss_scale 32768.0000 (34681.8180) mem 16138MB [2024-07-27 03:32:55 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2680/2859] eta 0:01:20 lr 0.000872 wd 0.0500 time 0.4272 (0.4481) data time 0.0015 (0.0122) model time 0.4256 (0.4359) loss 2.1019 (1.9973) grad_norm 2.5604 (inf) loss_scale 32768.0000 (34674.6796) mem 16138MB [2024-07-27 03:32:59 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2690/2859] eta 0:01:15 lr 0.000872 wd 0.0500 time 0.4282 (0.4480) data time 0.0008 (0.0121) model time 0.4274 (0.4359) loss 2.1770 (1.9975) grad_norm 2.4649 (inf) loss_scale 32768.0000 (34667.5942) mem 16138MB [2024-07-27 03:33:03 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2700/2859] eta 0:01:11 lr 0.000872 wd 0.0500 time 0.4281 (0.4480) data time 0.0008 (0.0121) model time 0.4273 (0.4359) loss 2.2774 (1.9971) grad_norm 1.9458 (inf) loss_scale 32768.0000 (34660.5613) mem 16138MB [2024-07-27 03:33:08 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2710/2859] eta 0:01:06 lr 0.000872 wd 0.0500 time 0.4279 (0.4479) data time 0.0007 (0.0120) model time 0.4272 (0.4359) loss 2.1965 (1.9969) grad_norm 1.4778 (inf) loss_scale 32768.0000 (34653.5802) mem 16138MB [2024-07-27 03:33:12 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2720/2859] eta 0:01:02 lr 0.000872 wd 0.0500 time 0.4266 (0.4479) data time 0.0008 (0.0120) model time 0.4258 (0.4359) loss 2.1484 (1.9970) grad_norm 1.9913 (inf) loss_scale 32768.0000 (34646.6505) mem 16138MB [2024-07-27 03:33:16 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2730/2859] eta 0:00:57 lr 0.000872 wd 0.0500 time 0.4249 (0.4479) data time 0.0008 (0.0120) model time 0.4242 (0.4359) loss 2.3283 (1.9969) grad_norm 1.6794 (inf) loss_scale 32768.0000 (34639.7715) mem 16138MB [2024-07-27 03:33:21 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2740/2859] eta 0:00:53 lr 0.000872 wd 0.0500 time 0.4240 (0.4478) data time 0.0008 (0.0119) model time 0.4233 (0.4359) loss 2.2862 (1.9970) grad_norm 1.7257 (inf) loss_scale 32768.0000 (34632.9427) mem 16138MB [2024-07-27 03:33:25 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2750/2859] eta 0:00:48 lr 0.000872 wd 0.0500 time 0.4231 (0.4478) data time 0.0006 (0.0119) model time 0.4225 (0.4359) loss 1.5106 (1.9970) grad_norm 2.8539 (inf) loss_scale 32768.0000 (34626.1636) mem 16138MB [2024-07-27 03:33:30 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2760/2859] eta 0:00:44 lr 0.000872 wd 0.0500 time 0.4239 (0.4477) data time 0.0008 (0.0118) model time 0.4231 (0.4359) loss 1.8417 (1.9966) grad_norm 4.5991 (inf) loss_scale 32768.0000 (34619.4335) mem 16138MB [2024-07-27 03:33:34 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2770/2859] eta 0:00:39 lr 0.000872 wd 0.0500 time 0.4238 (0.4477) data time 0.0007 (0.0118) model time 0.4231 (0.4359) loss 1.5233 (1.9963) grad_norm 4.2230 (inf) loss_scale 32768.0000 (34612.7521) mem 16138MB [2024-07-27 03:33:38 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2780/2859] eta 0:00:35 lr 0.000872 wd 0.0500 time 0.4236 (0.4477) data time 0.0007 (0.0118) model time 0.4229 (0.4359) loss 2.3142 (1.9965) grad_norm 3.2259 (inf) loss_scale 32768.0000 (34606.1187) mem 16138MB [2024-07-27 03:33:43 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2790/2859] eta 0:00:30 lr 0.000872 wd 0.0500 time 0.4271 (0.4476) data time 0.0007 (0.0117) model time 0.4264 (0.4359) loss 2.1459 (1.9968) grad_norm 2.4108 (inf) loss_scale 32768.0000 (34599.5328) mem 16138MB [2024-07-27 03:33:47 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2800/2859] eta 0:00:26 lr 0.000872 wd 0.0500 time 0.4294 (0.4476) data time 0.0007 (0.0117) model time 0.4286 (0.4359) loss 1.7322 (1.9964) grad_norm 1.8060 (inf) loss_scale 32768.0000 (34592.9939) mem 16138MB [2024-07-27 03:33:51 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2810/2859] eta 0:00:21 lr 0.000872 wd 0.0500 time 0.4269 (0.4475) data time 0.0008 (0.0116) model time 0.4261 (0.4359) loss 1.9938 (1.9969) grad_norm 4.6479 (inf) loss_scale 32768.0000 (34586.5016) mem 16138MB [2024-07-27 03:33:56 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2820/2859] eta 0:00:17 lr 0.000872 wd 0.0500 time 0.4282 (0.4475) data time 0.0007 (0.0116) model time 0.4275 (0.4359) loss 1.4698 (1.9970) grad_norm 1.8755 (inf) loss_scale 32768.0000 (34580.0553) mem 16138MB [2024-07-27 03:34:00 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2830/2859] eta 0:00:12 lr 0.000872 wd 0.0500 time 0.4284 (0.4475) data time 0.0006 (0.0116) model time 0.4278 (0.4359) loss 1.7624 (1.9969) grad_norm 3.7602 (inf) loss_scale 32768.0000 (34573.6545) mem 16138MB [2024-07-27 03:34:05 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2840/2859] eta 0:00:08 lr 0.000872 wd 0.0500 time 0.4287 (0.4474) data time 0.0007 (0.0115) model time 0.4280 (0.4359) loss 2.1205 (1.9972) grad_norm 3.1047 (inf) loss_scale 32768.0000 (34567.2988) mem 16138MB [2024-07-27 03:34:09 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [29/300][2850/2859] eta 0:00:04 lr 0.000872 wd 0.0500 time 0.4269 (0.4474) data time 0.0004 (0.0115) model time 0.4265 (0.4359) loss 2.1876 (1.9977) grad_norm 1.9521 (inf) loss_scale 32768.0000 (34560.9877) mem 16138MB [2024-07-27 03:34:14 vssm1_high_tiny_gradacc2] (main.py 306): INFO EPOCH 29 training takes 0:21:20 [2024-07-27 03:34:14 vssm1_high_tiny_gradacc2] (utils.py 99): INFO /usr/classification/vssm1_high_tiny_gradacc2/20240726160635/ckpt_epoch_29.pth saving...... [2024-07-27 03:34:14 vssm1_high_tiny_gradacc2] (utils.py 101): INFO /usr/classification/vssm1_high_tiny_gradacc2/20240726160635/ckpt_epoch_29.pth saved !!! [2024-07-27 03:34:37 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [0/112] Time 22.606 (22.606) Loss 1.0176 (1.0176) Acc@1 79.018 (79.018) Acc@5 92.857 (92.857) Mem 16138MB [2024-07-27 03:34:38 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [10/112] Time 0.139 (2.182) Loss 0.5269 (0.9903) Acc@1 90.625 (77.638) Acc@5 95.982 (93.628) Mem 16138MB [2024-07-27 03:34:39 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [20/112] Time 0.139 (1.209) Loss 1.3096 (1.0319) Acc@1 67.634 (76.701) Acc@5 93.304 (93.399) Mem 16138MB [2024-07-27 03:34:41 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [30/112] Time 0.139 (0.864) Loss 1.0557 (1.0591) Acc@1 76.563 (75.072) Acc@5 93.304 (93.548) Mem 16138MB [2024-07-27 03:34:42 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [40/112] Time 0.139 (0.687) Loss 0.8892 (1.0369) Acc@1 80.134 (75.626) Acc@5 94.643 (93.881) Mem 16138MB [2024-07-27 03:34:44 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [50/112] Time 0.139 (0.580) Loss 1.5205 (1.1105) Acc@1 68.080 (74.090) Acc@5 86.607 (92.918) Mem 16138MB [2024-07-27 03:34:45 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [60/112] Time 0.139 (0.507) Loss 1.6621 (1.2098) Acc@1 59.598 (71.981) Acc@5 83.705 (91.522) Mem 16138MB [2024-07-27 03:34:46 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [70/112] Time 0.139 (0.456) Loss 1.8613 (1.2623) Acc@1 59.821 (71.004) Acc@5 81.920 (90.660) Mem 16138MB [2024-07-27 03:34:48 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [80/112] Time 0.139 (0.417) Loss 1.3174 (1.3078) Acc@1 67.857 (69.899) Acc@5 88.170 (90.090) Mem 16138MB [2024-07-27 03:34:49 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [90/112] Time 0.139 (0.386) Loss 1.8135 (1.3504) Acc@1 55.134 (69.223) Acc@5 85.045 (89.453) Mem 16138MB [2024-07-27 03:34:51 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [100/112] Time 0.139 (0.362) Loss 1.7207 (1.3857) Acc@1 65.402 (68.487) Acc@5 82.143 (88.886) Mem 16138MB [2024-07-27 03:34:52 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [110/112] Time 0.139 (0.342) Loss 0.5435 (1.3779) Acc@1 87.500 (68.627) Acc@5 97.098 (89.058) Mem 16138MB [2024-07-27 03:34:53 vssm1_high_tiny_gradacc2] (main.py 353): INFO Acc@1 68.626 Acc@5 89.066 [2024-07-27 03:34:53 vssm1_high_tiny_gradacc2] (main.py 222): INFO Accuracy of the network on the 50000 test images: 68.6% [2024-07-27 03:34:53 vssm1_high_tiny_gradacc2] (main.py 224): INFO Max accuracy: 68.63% [2024-07-27 03:35:16 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [0/112] Time 23.057 (23.057) Loss 1.0684 (1.0684) Acc@1 75.223 (75.223) Acc@5 94.643 (94.643) Mem 16138MB [2024-07-27 03:35:18 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [10/112] Time 0.138 (2.222) Loss 0.7012 (1.1181) Acc@1 89.062 (75.832) Acc@5 95.982 (93.060) Mem 16138MB [2024-07-27 03:35:19 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [20/112] Time 0.138 (1.230) Loss 1.5107 (1.1588) Acc@1 59.821 (74.809) Acc@5 91.295 (92.623) Mem 16138MB [2024-07-27 03:35:20 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [30/112] Time 0.139 (0.878) Loss 1.1465 (1.1626) Acc@1 76.562 (74.150) Acc@5 94.866 (93.059) Mem 16138MB [2024-07-27 03:35:22 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [40/112] Time 0.138 (0.697) Loss 1.0566 (1.1352) Acc@1 79.911 (74.848) Acc@5 93.080 (93.429) Mem 16138MB [2024-07-27 03:35:23 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [50/112] Time 0.139 (0.588) Loss 1.6562 (1.2065) Acc@1 63.839 (73.210) Acc@5 84.821 (92.314) Mem 16138MB [2024-07-27 03:35:24 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [60/112] Time 0.138 (0.514) Loss 1.7354 (1.3222) Acc@1 57.366 (70.693) Acc@5 81.920 (90.640) Mem 16138MB [2024-07-27 03:35:26 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [70/112] Time 0.138 (0.464) Loss 1.8867 (1.3710) Acc@1 62.054 (69.841) Acc@5 80.804 (89.798) Mem 16138MB [2024-07-27 03:35:27 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [80/112] Time 0.140 (0.424) Loss 1.5879 (1.4187) Acc@1 65.848 (68.684) Acc@5 86.384 (89.173) Mem 16138MB [2024-07-27 03:35:29 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [90/112] Time 0.139 (0.393) Loss 2.0332 (1.4653) Acc@1 54.464 (67.874) Acc@5 80.804 (88.430) Mem 16138MB [2024-07-27 03:35:30 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [100/112] Time 0.139 (0.368) Loss 1.8418 (1.5056) Acc@1 61.607 (67.006) Acc@5 83.259 (87.860) Mem 16138MB [2024-07-27 03:35:32 vssm1_high_tiny_gradacc2] (main.py 346): INFO Test: [110/112] Time 0.139 (0.347) Loss 0.6914 (1.4968) Acc@1 85.714 (67.117) Acc@5 96.205 (88.049) Mem 16138MB [2024-07-27 03:35:33 vssm1_high_tiny_gradacc2] (main.py 353): INFO Acc@1 67.120 Acc@5 88.058 [2024-07-27 03:35:33 vssm1_high_tiny_gradacc2] (main.py 227): INFO Accuracy of the network on the 50000 test images: 67.1% [2024-07-27 03:35:33 vssm1_high_tiny_gradacc2] (main.py 229): INFO Max accuracy ema: 67.12% [2024-07-27 03:36:04 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][0/2859] eta 1 day, 0:25:17 lr 0.000872 wd 0.0500 time 30.7511 (30.7511) data time 30.2318 (30.2318) model time 0.0000 (0.0000) loss 2.1571 (2.1571) grad_norm 0.0000 (0.0000) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:08 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][10/2859] eta 2:31:10 lr 0.000872 wd 0.0500 time 0.4175 (3.1838) data time 0.0007 (2.7493) model time 0.0000 (0.0000) loss 1.6635 (1.9808) grad_norm 2.3338 (2.0338) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:12 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][20/2859] eta 1:28:53 lr 0.000872 wd 0.0500 time 0.4177 (1.8785) data time 0.0007 (1.4406) model time 0.0000 (0.0000) loss 1.9455 (1.9647) grad_norm 2.2848 (2.1245) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:16 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][30/2859] eta 1:06:30 lr 0.000872 wd 0.0500 time 0.4183 (1.4104) data time 0.0006 (0.9762) model time 0.0000 (0.0000) loss 1.5447 (1.9503) grad_norm 1.9447 (2.1679) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:21 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][40/2859] eta 0:55:00 lr 0.000872 wd 0.0500 time 0.4194 (1.1708) data time 0.0007 (0.7383) model time 0.0000 (0.0000) loss 1.8946 (1.9332) grad_norm 2.0142 (2.2084) loss_scale 32768.0000 (32768.0000) mem 16138MB [2024-07-27 03:36:25 vssm1_high_tiny_gradacc2] (main.py 295): INFO Train: [30/300][50/2859] eta 0:48:01 lr 0.000872 wd 0.0500 time 0.4157 (1.0258) data time 0.0008 (0.5938) model time 0.0000 (0.0000) loss 1.7480 (1.9751) grad_norm 4.0086 (2.3248) loss_scale 32768.0000 (32768.0000) mem 16138MB

ydhongHIT commented 1 month ago

Hi, the first is the resume log and the second is the initial log. Resume from epoch 29. It can be seen that test accuracy without ema is 68.626 while the test accuracy with ema is different (67.12 v.s. 67.242).

ydhongHIT commented 1 month ago

I encountered this problem more than once. The difference is small, but it does exist.