[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
When I was training bevformer-base with batch_size =2, I met the Error:
Traceback (most recent call last):
File "./tools/train.py", line 259, in <module>
main()
File "./tools/train.py", line 248, in main
custom_train_model(
File "/home/JJ_Group/wangz/wangzhe21/BEVFormer_wzh/projects/mmdet3d_plugin/bevformer/apis/train.py", line 27, in custom_train_model
custom_train_detector(
File "/home/JJ_Group/wangz/wangzhe21/BEVFormer_wzh/projects/mmdet3d_plugin/bevformer/apis/mmdet_train.py", line 199, in custom_train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
self.call_hook('after_train_iter')
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/mmcv/runner/hooks/optimizer.py", line 35, in after_train_iter
runner.outputs['loss'].backward()
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
Variable._execution_engine.run_backward(
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
_error_if_any_worker_fails()
RuntimeError: DataLoader worker (pid 450887) is killed by signal: Killed.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2532319) of binary: /home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/bin/python
The training process was normal in the first epoch and got the evaluation results as follows.
2022-07-03 17:19:56,577 - mmdet - INFO - Epoch [1][6850/7033] lr: 2.000e-04, eta: 18 days, 2:20:33, time: 9.657, data_time: 2.484, memory: 49347, loss_cls: 0.3669, loss_bbox: 0.6684, d0.loss_cls: 0.3542, d0.loss_bbox: 0.7710, d1.loss_cls: 0.3529, d1.loss_
bbox: 0.6866, d2.loss_cls: 0.3549, d2.loss_bbox: 0.6724, d3.loss_cls: 0.3598, d3.loss_bbox: 0.6688, d4.loss_cls: 0.3602, d4.loss_bbox: 0.6663, loss: 6.2823, grad_norm: 50.7038
2022-07-03 17:28:38,627 - mmdet - INFO - Epoch [1][6900/7033] lr: 2.000e-04, eta: 18 days, 2:27:51, time: 10.440, data_time: 2.962, memory: 49347, loss_cls: 0.3638, loss_bbox: 0.6734, d0.loss_cls: 0.3585, d0.loss_bbox: 0.7751, d1.loss_cls: 0.3556, d1.loss
_bbox: 0.6912, d2.loss_cls: 0.3550, d2.loss_bbox: 0.6796, d3.loss_cls: 0.3580, d3.loss_bbox: 0.6759, d4.loss_cls: 0.3581, d4.loss_bbox: 0.6746, loss: 6.3187, grad_norm: 45.8548
2022-07-03 17:36:48,350 - mmdet - INFO - Epoch [1][6950/7033] lr: 2.000e-04, eta: 18 days, 2:22:22, time: 9.794, data_time: 2.617, memory: 49347, loss_cls: 0.3452, loss_bbox: 0.6631, d0.loss_cls: 0.3387, d0.loss_bbox: 0.7695, d1.loss_cls: 0.3399, d1.loss_
bbox: 0.6722, d2.loss_cls: 0.3386, d2.loss_bbox: 0.6633, d3.loss_cls: 0.3408, d3.loss_bbox: 0.6579, d4.loss_cls: 0.3407, d4.loss_bbox: 0.6581, loss: 6.1281, grad_norm: 48.2113
2022-07-03 17:45:12,259 - mmdet - INFO - Exp name: bevformer_base.py
2022-07-03 17:45:12,261 - mmdet - INFO - Epoch [1][7000/7033] lr: 2.000e-04, eta: 18 days, 2:22:21, time: 10.079, data_time: 2.743, memory: 49347, loss_cls: 0.3570, loss_bbox: 0.6747, d0.loss_cls: 0.3517, d0.loss_bbox: 0.7780, d1.loss_cls: 0.3466, d1.loss
_bbox: 0.6869, d2.loss_cls: 0.3441, d2.loss_bbox: 0.6775, d3.loss_cls: 0.3488, d3.loss_bbox: 0.6760, d4.loss_cls: 0.3510, d4.loss_bbox: 0.6730, loss: 6.2653, grad_norm: 54.0465
2022-07-03 17:50:16,340 - mmdet - INFO - Saving checkpoint at 1 epochs
[ ] 0/6019, elapsed: 0s, ETA:/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version
of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
/home/JJ_Group/wangz/wangzhe21/BEVFormer_wzh/projects/mmdet3d_plugin/core/bbox/coders/nms_free_coder.py:76: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(
True), rather than torch.tensor(sourceTensor).
self.post_center_range = torch.tensor(
/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floo
r'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
[ ] 2/6019, 0.1 task/s, elapsed: 16s, ETA: 47074s/home/JJ_Group/wangz/wangzhe21/BEVFormer_wzh/projects/mmdet3d_plugin/core/bbox/coders/nms_free_coder.py:76: UserWarning: To copy construct from a tensor, it is
recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range = torch.tensor(
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 6020/6019, 3.0 task/s, elapsed: 2031s, ETA: 0s
Formating bboxes of pts_bbox
Start to convert detection format...
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 6019/6019, 8.9 task/s, elapsed: 680s, ETA: 0s
Results writes to val/./work_dirs/bevformer_base_bs_4/Sat_Jul__2_22_57_29_2022/pts_bbox/results_nusc.json
Evaluating bboxes of pts_bbox
======
Loading NuScenes tables for version v1.0-trainval...
23 category,
8 attribute,
4 visibility,
64386 instance,
12 sensor,
10200 calibrated_sensor,
2631083 ego_pose,
68 log,
850 scene,
34149 sample,
2631083 sample_data,
1166187 sample_annotation,
4 map,
Done loading in 165.923 seconds.
======
Reverse indexing ...
Done reverse indexing in 25.8 seconds.
======
Initializing nuScenes detection evaluation
Loaded results from val/./work_dirs/bevformer_base_bs_4/Sat_Jul__2_22_57_29_2022/pts_bbox/results_nusc.json. Found detections for 6019 samples.
Loading annotations for val split from nuScenes version: v1.0-trainval
[00:17<00:00, 338.76it/s]
Loaded ground truth annotations for 6019 samples.
Filtering predictions
=> Original number of boxes: 1368478
=> After distance based filtering: 1368005
=> After LIDAR and RADAR points based filtering: 1368005
=> After bike rack filtering: 1367610
Filtering ground truth annotations
=> Original number of boxes: 187528
=> After distance based filtering: 134565
=> After LIDAR and RADAR points based filtering: 121871
=> After bike rack filtering: 121861
Accumulating metric data...
Calculating metrics...
Saving metrics to: val/./work_dirs/bevformer_base_bs_4/Sat_Jul__2_22_57_29_2022/pts_bbox
mAP: 0.2559
mATE: 0.8938
mASE: 0.3263
mAOE: 0.6558
mAVE: 1.1642
mAAE: 0.3630
NDS: 0.3041
Eval time: 538.7s
Per-class results:
Object Class AP ATE ASE AOE AVE AAE
car 0.436 0.671 0.176 0.149 1.922 0.458
truck 0.214 0.828 0.255 0.226 1.109 0.346
bus 0.251 0.914 0.286 0.255 2.448 0.644
trailer 0.064 1.255 0.350 0.982 0.645 0.165
construction_vehicle 0.059 1.085 0.510 1.329 0.124 0.332
pedestrian 0.346 0.866 0.328 0.735 0.765 0.398
motorcycle 0.248 0.879 0.344 0.817 1.602 0.379
bicycle 0.213 0.919 0.319 1.120 0.698 0.181
traffic_cone 0.386 0.728 0.380 nan nan nan
barrier 0.343 0.793 0.314 0.290 nan nan
2022-07-03 18:53:57,216 - mmdet - INFO - Exp name: bevformer_base.py
When I was training bevformer-base with
batch_size =2
, I met the Error:The training process was normal in the first epoch and got the evaluation results as follows.