Closed Kongbaikb closed 11 months ago
有训练日志吗,能发给我吗,需要v299的,我怎么训练不收敛,邮箱:liwenx@whu.edu.cn
Show me your log.
有没有谁能提供类似这种日志,可以分析的,配置文件+日志 09/24 23:53:37 - mmengine - INFO - Epoch(train) [1][16650/30895] lr: 2.5000e-04 eta: 6 days, 22:56:27 time: 1.0044 data_time: 0.0229 memory: 11919 grad_norm: 43.1900 loss: 17.8979 loss_cls_dn: 0.0279 loss_bbox_dn: 0.7917 d0.loss_cls_dn: 0.0333 d0.loss_bbox_dn: 0.7555 d1.loss_cls_dn: 0.0274 d1.loss_bbox_dn: 0.7555 d2.loss_cls_dn: 0.0262 d2.loss_bbox_dn: 0.7612 d3.loss_cls_dn: 0.0263 d3.loss_bbox_dn: 0.7695 d4.loss_cls_dn: 0.0269 d4.loss_bbox_dn: 0.7798 loss_cls: 0.5362 loss_bbox: 1.6351 d0.loss_cls: 0.5407 d0.loss_bbox: 1.7038 d1.loss_cls: 0.5342 d1.loss_bbox: 1.6533 d2.loss_cls: 0.5336 d2.loss_bbox: 1.6430 d3.loss_cls: 0.5342 d3.loss_bbox: 1.6354 d4.loss_cls: 0.5353 d4.loss_bbox: 1.6319 09/24 23:54:27 - mmengine - INFO - Epoch(train) [1][16700/30895] lr: 2.5000e-04 eta: 6 days, 22:55:34 time: 0.9978 data_time: 0.0217 memory: 11739 grad_norm: 46.9522 loss: 18.0262 loss_cls_dn: 0.0267 loss_bbox_dn: 0.8171 d0.loss_cls_dn: 0.0317 d0.loss_bbox_dn: 0.7704 d1.loss_cls_dn: 0.0263 d1.loss_bbox_dn: 0.7716 d2.loss_cls_dn: 0.0254 d2.loss_bbox_dn: 0.7789 d3.loss_cls_dn: 0.0254 d3.loss_bbox_dn: 0.7892 d4.loss_cls_dn: 0.0259 d4.loss_bbox_dn: 0.8021 loss_cls: 0.5478 loss_bbox: 1.6278 d0.loss_cls: 0.5502 d0.loss_bbox: 1.6916 d1.loss_cls: 0.5452 d1.loss_bbox: 1.6450 d2.loss_cls: 0.5446 d2.loss_bbox: 1.6335 d3.loss_cls: 0.5464 d3.loss_bbox: 1.6287 d4.loss_cls: 0.5470 d4.loss_bbox: 1.6276 09/24 23:55:17 - mmengine - INFO - Epoch(train) [1][16750/30895] lr: 2.5000e-04 eta: 6 days, 22:54:40 time: 0.9972 data_time: 0.0233 memory: 11693 grad_norm: 60.7276 loss: 17.9187 loss_cls_dn: 0.0319 loss_bbox_dn: 0.8148 d0.loss_cls_dn: 0.0348 d0.loss_bbox_dn: 0.7541 d1.loss_cls_dn: 0.0302 d1.loss_bbox_dn: 0.7574 d2.loss_cls_dn: 0.0295 d2.loss_bbox_dn: 0.7670 d3.loss_cls_dn: 0.0299 d3.loss_bbox_dn: 0.7801 d4.loss_cls_dn: 0.0307 d4.loss_bbox_dn: 0.7963 loss_cls: 0.5488 loss_bbox: 1.6161 d0.loss_cls: 0.5491 d0.loss_bbox: 1.6706 d1.loss_cls: 0.5449 d1.loss_bbox: 1.6291 d2.loss_cls: 0.5456 d2.loss_bbox: 1.6236 d3.loss_cls: 0.5463 d3.loss_bbox: 1.6204 d4.loss_cls: 0.5475 d4.loss_bbox: 1.6200 09/24 23:56:07 - mmengine - INFO - Epoch(train) [1][16800/30895] lr: 2.5000e-04 eta: 6 days, 22:53:57 time: 1.0037 data_time: 0.0228 memory: 11673 grad_norm: 42.1954 loss: 17.8680 loss_cls_dn: 0.0289 loss_bbox_dn: 0.7903 d0.loss_cls_dn: 0.0322 d0.loss_bbox_dn: 0.7568 d1.loss_cls_dn: 0.0269 d1.loss_bbox_dn: 0.7560 d2.loss_cls_dn: 0.0265 d2.loss_bbox_dn: 0.7612 d3.loss_cls_dn: 0.0268 d3.loss_bbox_dn: 0.7690 d4.loss_cls_dn: 0.0276 d4.loss_bbox_dn: 0.7789 loss_cls: 0.5411 loss_bbox: 1.6281 d0.loss_cls: 0.5435 d0.loss_bbox: 1.6863 d1.loss_cls: 0.5399 d1.loss_bbox: 1.6420 d2.loss_cls: 0.5388 d2.loss_bbox: 1.6340 d3.loss_cls: 0.5395 d3.loss_bbox: 1.6272 d4.loss_cls: 0.5403 d4.loss_bbox: 1.6262 09/24 23:56:57 - mmengine - INFO - Epoch(train) [1][16850/30895] lr: 2.5000e-04 eta: 6 days, 22:53:10 time: 1.0011 data_time: 0.0229 memory: 11898 grad_norm: 51.8652 loss: 17.9138 loss_cls_dn: 0.0297 loss_bbox_dn: 0.8142 d0.loss_cls_dn: 0.0321 d0.loss_bbox_dn: 0.7471 d1.loss_cls_dn: 0.0271 d1.loss_bbox_dn: 0.7517 d2.loss_cls_dn: 0.0267 d2.loss_bbox_dn: 0.7617 d3.loss_cls_dn: 0.0273 d3.loss_bbox_dn: 0.7764 d4.loss_cls_dn: 0.0284 d4.loss_bbox_dn: 0.7941 loss_cls: 0.5487 loss_bbox: 1.6199 d0.loss_cls: 0.5547 d0.loss_bbox: 1.6886 d1.loss_cls: 0.5472 d1.loss_bbox: 1.6374 d2.loss_cls: 0.5468 d2.loss_bbox: 1.6242 d3.loss_cls: 0.5473 d3.loss_bbox: 1.6180 d4.loss_cls: 0.5482 d4.loss_bbox: 1.6162 09/24 23:57:47 - mmengine - INFO - Epoch(train) [1][16900/30895] lr: 2.5000e-04 eta: 6 days, 22:52:08 time: 0.9930 data_time: 0.0233 memory: 11974 grad_norm: 44.4593 loss: 17.9305 loss_cls_dn: 0.0325 loss_bbox_dn: 0.8126 d0.loss_cls_dn: 0.0337 d0.loss_bbox_dn: 0.7596 d1.loss_cls_dn: 0.0285 d1.loss_bbox_dn: 0.7629 d2.loss_cls_dn: 0.0284 d2.loss_bbox_dn: 0.7705 d3.loss_cls_dn: 0.0293 d3.loss_bbox_dn: 0.7820 d4.loss_cls_dn: 0.0306 d4.loss_bbox_dn: 0.7962 loss_cls: 0.5400 loss_bbox: 1.6286 d0.loss_cls: 0.5462 d0.loss_bbox: 1.6762 d1.loss_cls: 0.5381 d1.loss_bbox: 1.6338 d2.loss_cls: 0.5387 d2.loss_bbox: 1.6296 d3.loss_cls: 0.5385 d3.loss_bbox: 1.6270 d4.loss_cls: 0.5392 d4.loss_bbox: 1.6278
@Kongbaikb You modified my setting. An epoch should contain 3517 steps, see https://github.com/MCG-NJU/SparseBEV/issues/15#issuecomment-1731012623
@Kongbaikb You modified my setting. An epoch should contain 3517 steps, see #15 (comment)
I have increased the batch size, will it have a significant impact?
Of courses. DETRs are sensitive to batchsize and learning rates.
Of courses. DETRs are sensitive to batchsize and learning rates.
I used 8 v100 cards, does the number of cards affect it?
We use 8 gpus too.
We use 8 gpus too.
Thank you. I'll change the batch size back to 8 to try the effect.
Why is it that the NDS of the model I trained using the "r50_nuimg_704x256" configuration file is only 53.5, while the one you wrote on Github is 55.6? Is this caused by unstable model training?