[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
527
stars
63
forks
source link
你好!再向你请教一个问题,就是我把部分模块冻结不更新参数的时候,跑的V2版本的vit_b_k400_ft.sh,batch size设置为4的时候一个epoch训练时间为1小时20分钟,batch size设置为8的时候一个epoch训练时间也为1小时左右,batch size设置为32的时候一个epoch训练时间也为1小时左右,请问这是正常现象么,就是batch size增大4倍的时候,每一个step时间也会增大四倍,然后一个epoch的总时间就不怎么变化,但无论batch size是4,8,还是32,GPU利用率好像都是满的(GPU-Util Compute M.这一列),请问我这里成倍数增加batch size而不能成倍数减少训练时间是正常的吗,目前batch size为4和8都能完整训练十个epoch,但是为32的时候会报错RuntimeError: DataLoader worker (pid 34621) is killed by signal: Killed. #15
Closed
DragonWang-cell closed 1 year ago
batch size设置为4的时候: Epoch: [0] [ 520/14651] eta: 1:15:14 lr: 0.000000 min_lr: 0.000000 loss: 5.9574 (5.9875) loss_scale: 1024.0000 (507.0864) weight_decay: 0.0500 (0.0500) grad_norm: 3.9344 (3.8902) time: 0.3269 (0.2472 -- 0.6107) data: 0.0189 (0.0001 -- 0.1967) max mem: 1399 Epoch: [0] [ 540/14651] eta: 1:15:06 lr: 0.000000 min_lr: 0.000000 loss: 6.2149 (5.9955) loss_scale: 2048.0000 (564.0518) weight_decay: 0.0500 (0.0500) grad_norm: 3.9445 (3.8901) time: 0.3168 (0.2267 -- 1.0072) data: 0.0463 (0.0001 -- 0.7580) max mem: 1399 Epoch: [0] [ 560/14651] eta: 1:15:02 lr: 0.000000 min_lr: 0.000000 loss: 6.1452 (6.0008) loss_scale: 2048.0000 (616.9554) weight_decay: 0.0500 (0.0500) grad_norm: 4.0181 (3.8955) time: 0.3250 (0.2347 -- 1.2508) data: 0.0027 (0.0001 -- 0.0454) max mem: 1399 Epoch: [0] [ 580/14651] eta: 1:14:44 lr: 0.000000 min_lr: 0.000000 loss: 5.7761 (5.9944) loss_scale: 2048.0000 (666.2169) weight_decay: 0.0500 (0.0500) grad_norm: 3.9082 (3.8942) time: 0.2955 (0.2435 -- 0.7480) data: 0.0323 (0.0001 -- 0.5108) max mem: 1399 Epoch: [0] [ 600/14651] eta: 1:14:26 lr: 0.000000 min_lr: 0.000000 loss: 5.8527 (5.9894) loss_scale: 2048.0000 (712.1997) weight_decay: 0.0500 (0.0500) grad_norm: 4.0570 (3.8975) time: 0.2939 (0.2322 -- 0.6804) data: 0.0239 (0.0003 -- 0.4686) max mem: 1399 Epoch: [0] [ 620/14651] eta: 1:14:29 lr: 0.000000 min_lr: 0.000000 loss: 6.1052 (5.9927) loss_scale: 2048.0000 (755.2206) weight_decay: 0.0500 (0.0500) grad_norm: 3.7728 (3.8941) time: 0.3377 (0.2268 -- 0.9082) data: 0.0112 (0.0001 -- 0.1255) max mem: 1399 [2023-05-18 23:48:49,839] [INFO] [fused_optimizer.py:370:_update_scale] No Grad overflow for 128 iterations [2023-05-18 23:48:49,839] [INFO] [fused_optimizer.py:371:_update_scale] Increasing dynamic loss scale from 2048 to 4096
batch size设置为8的时候: Epoch: [0] [ 520/7325] eta: 1:07:53 lr: 0.000001 min_lr: 0.000001 loss: 5.5630 (5.9820) loss_scale: 1024.0000 (507.0864) weight_decay: 0.0500 (0.0500) grad_norm: 3.5644 (3.7585) time: 0.6204 (0.4038 -- 2.1102) data: 0.0023 (0.0004 -- 0.0050) max mem: 2446 Epoch: [0] [ 540/7325] eta: 1:07:22 lr: 0.000001 min_lr: 0.000001 loss: 6.0109 (5.9832) loss_scale: 2048.0000 (564.0518) weight_decay: 0.0500 (0.0500) grad_norm: 3.7292 (3.7575) time: 0.5229 (0.3955 -- 1.2074) data: 0.0026 (0.0003 -- 0.0143) max mem: 2446 Epoch: [0] [ 560/7325] eta: 1:06:38 lr: 0.000001 min_lr: 0.000001 loss: 5.7411 (5.9757) loss_scale: 2048.0000 (616.9554) weight_decay: 0.0500 (0.0500) grad_norm: 3.9810 (3.7646) time: 0.4626 (0.3977 -- 0.6022) data: 0.0028 (0.0002 -- 0.0100) max mem: 2446 Epoch: [0] [ 580/7325] eta: 1:06:15 lr: 0.000001 min_lr: 0.000001 loss: 6.0568 (5.9757) loss_scale: 2048.0000 (666.2169) weight_decay: 0.0500 (0.0500) grad_norm: 3.8520 (3.7678) time: 0.5432 (0.3913 -- 1.0046) data: 0.0483 (0.0012 -- 0.5670) max mem: 2446 Epoch: [0] [ 600/7325] eta: 1:05:51 lr: 0.000001 min_lr: 0.000001 loss: 6.0425 (5.9778) loss_scale: 2048.0000 (712.1997) weight_decay: 0.0500 (0.0500) grad_norm: 3.8539 (3.7686) time: 0.5360 (0.3927 -- 0.8390) data: 0.0418 (0.0010 -- 0.4310) max mem: 2446 Epoch: [0] [ 620/7325] eta: 1:05:36 lr: 0.000001 min_lr: 0.000001 loss: 6.0304 (5.9792) loss_scale: 2048.0000 (755.2206) weight_decay: 0.0500 (0.0500) grad_norm: 3.8550 (3.7696) time: 0.5713 (0.4063 -- 1.3603) data: 0.0031 (0.0008 -- 0.0113) max mem: 2446 [2023-05-17 19:13:54,854] [INFO] [fused_optimizer.py:370:_update_scale] No Grad overflow for 128 iterations [2023-05-17 19:13:54,855] [INFO] [fused_optimizer.py:371:_update_scale] Increasing dynamic loss scale from 2048 to 4096
batch size设置为32的时候: Epoch: [0] [ 520/1831] eta: 0:51:08 lr: 0.000014 min_lr: 0.000014 loss: 6.0383 (5.9943) loss_scale: 1024.0000 (507.0864) weight_decay: 0.0500 (0.0500) grad_norm: 3.6213 (3.5492) time: 2.2246 (1.5183 -- 6.5994) data: 0.0011 (0.0003 -- 0.0022) max mem: 8732 Epoch: [0] [ 540/1831] eta: 0:50:09 lr: 0.000015 min_lr: 0.000015 loss: 5.9155 (5.9919) loss_scale: 2048.0000 (564.0518) weight_decay: 0.0500 (0.0500) grad_norm: 3.6299 (3.5474) time: 2.0826 (1.3866 -- 4.9960) data: 0.0013 (0.0004 -- 0.0036) max mem: 8732 Epoch: [0] [ 560/1831] eta: 0:49:23 lr: 0.000015 min_lr: 0.000015 loss: 5.8051 (5.9882) loss_scale: 2048.0000 (616.9554) weight_decay: 0.0500 (0.0500) grad_norm: 3.5953 (3.5460) time: 2.3389 (1.4880 -- 8.4142) data: 0.0011 (0.0006 -- 0.0026) max mem: 8732 Epoch: [0] [ 580/1831] eta: 0:48:25 lr: 0.000016 min_lr: 0.000016 loss: 5.8914 (5.9849) loss_scale: 2048.0000 (666.2169) weight_decay: 0.0500 (0.0500) grad_norm: 3.7235 (3.5488) time: 2.0733 (1.5494 -- 6.0287) data: 0.0010 (0.0005 -- 0.0022) max mem: 8732 Epoch: [0] [ 600/1831] eta: 0:47:43 lr: 0.000016 min_lr: 0.000016 loss: 6.0225 (5.9861) loss_scale: 2048.0000 (712.1997) weight_decay: 0.0500 (0.0500) grad_norm: 3.7547 (3.5498) time: 2.4241 (1.5126 -- 9.1236) data: 0.0012 (0.0007 -- 0.0032) max mem: 8732 Epoch: [0] [ 620/1831] eta: 0:46:56 lr: 0.000017 min_lr: 0.000017 loss: 5.8877 (5.9853) loss_scale: 2048.0000 (755.2206) weight_decay: 0.0500 (0.0500) grad_norm: 3.5791 (3.5478) time: 2.3172 (1.4694 -- 9.0961) data: 0.0011 (0.0002 -- 0.0023) max mem: 8732 [2023-05-18 21:03:26,806] [INFO] [fused_optimizer.py:370:_update_scale] No Grad overflow for 128 iterations [2023-05-18 21:03:26,806] [INFO] [fused_optimizer.py:370:_update_scale] No Grad overflow for 128 iterations [2023-05-18 21:03:26,806] [INFO] [fused_optimizer.py:371:_update_scale] Increasing dynamic loss scale from 2048 to 4096