[ECCV 2022] This is the official implementation of BEVFormer, a camera-only framework for autonomous driving perception, e.g., 3D object detection and semantic map segmentation.
When I was training bevformer-base with batch_size =2, I met the Error:
Traceback (most recent call last):
File "./tools/train.py", line 259, in <module>
File "./tools/train.py", line 248, in main
File "/home/JJ_Group/wangz/wangzhe21/BEVFormer_wzh/projects/mmdet3d_plugin/bevformer/apis/train.py", line 27, in custom_train_model
File "/home/JJ_Group/wangz/wangzhe21/BEVFormer_wzh/projects/mmdet3d_plugin/bevformer/apis/mmdet_train.py", line 199, in custom_train_detector
runner.run(data_loaders, cfg.workflow)
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
epoch_runner(data_loaders[i], **kwargs)
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/mmcv/runner/epoch_based_runner.py", line 51, in train
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/mmcv/runner/base_runner.py", line 307, in call_hook
getattr(hook, fn_name)(self)
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/mmcv/runner/hooks/optimizer.py", line 35, in after_train_iter
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/autograd/__init__.py", line 147, in backward
File "/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler
RuntimeError: DataLoader worker (pid 450887) is killed by signal: Killed.
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 2532319) of binary: /home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/bin/python
The training process was normal in the first epoch and got the evaluation results as follows.
2022-07-03 17:19:56,577 - mmdet - INFO - Epoch [1][6850/7033] lr: 2.000e-04, eta: 18 days, 2:20:33, time: 9.657, data_time: 2.484, memory: 49347, loss_cls: 0.3669, loss_bbox: 0.6684, d0.loss_cls: 0.3542, d0.loss_bbox: 0.7710, d1.loss_cls: 0.3529, d1.loss_
bbox: 0.6866, d2.loss_cls: 0.3549, d2.loss_bbox: 0.6724, d3.loss_cls: 0.3598, d3.loss_bbox: 0.6688, d4.loss_cls: 0.3602, d4.loss_bbox: 0.6663, loss: 6.2823, grad_norm: 50.7038
2022-07-03 17:28:38,627 - mmdet - INFO - Epoch [1][6900/7033] lr: 2.000e-04, eta: 18 days, 2:27:51, time: 10.440, data_time: 2.962, memory: 49347, loss_cls: 0.3638, loss_bbox: 0.6734, d0.loss_cls: 0.3585, d0.loss_bbox: 0.7751, d1.loss_cls: 0.3556, d1.loss
_bbox: 0.6912, d2.loss_cls: 0.3550, d2.loss_bbox: 0.6796, d3.loss_cls: 0.3580, d3.loss_bbox: 0.6759, d4.loss_cls: 0.3581, d4.loss_bbox: 0.6746, loss: 6.3187, grad_norm: 45.8548
2022-07-03 17:36:48,350 - mmdet - INFO - Epoch [1][6950/7033] lr: 2.000e-04, eta: 18 days, 2:22:22, time: 9.794, data_time: 2.617, memory: 49347, loss_cls: 0.3452, loss_bbox: 0.6631, d0.loss_cls: 0.3387, d0.loss_bbox: 0.7695, d1.loss_cls: 0.3399, d1.loss_
bbox: 0.6722, d2.loss_cls: 0.3386, d2.loss_bbox: 0.6633, d3.loss_cls: 0.3408, d3.loss_bbox: 0.6579, d4.loss_cls: 0.3407, d4.loss_bbox: 0.6581, loss: 6.1281, grad_norm: 48.2113
2022-07-03 17:45:12,259 - mmdet - INFO - Exp name: bevformer_base.py
2022-07-03 17:45:12,261 - mmdet - INFO - Epoch [1][7000/7033] lr: 2.000e-04, eta: 18 days, 2:22:21, time: 10.079, data_time: 2.743, memory: 49347, loss_cls: 0.3570, loss_bbox: 0.6747, d0.loss_cls: 0.3517, d0.loss_bbox: 0.7780, d1.loss_cls: 0.3466, d1.loss
_bbox: 0.6869, d2.loss_cls: 0.3441, d2.loss_bbox: 0.6775, d3.loss_cls: 0.3488, d3.loss_bbox: 0.6760, d4.loss_cls: 0.3510, d4.loss_bbox: 0.6730, loss: 6.2653, grad_norm: 54.0465
2022-07-03 17:50:16,340 - mmdet - INFO - Saving checkpoint at 1 epochs
[ ] 0/6019, elapsed: 0s, ETA:/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version
of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
/home/JJ_Group/wangz/wangzhe21/BEVFormer_wzh/projects/mmdet3d_plugin/core/bbox/coders/nms_free_coder.py:76: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(
True), rather than torch.tensor(sourceTensor).
self.post_center_range = torch.tensor(
/home/JJ_Group/wangz/.conda/envs/mmdet3d_v0.17.1/lib/python3.8/site-packages/torch/_tensor.py:575: UserWarning: floor_divide is deprecated, and will be removed in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floo
r'). This results in incorrect rounding for negative values.
To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor'). (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:467.)
return torch.floor_divide(self, other)
[ ] 2/6019, 0.1 task/s, elapsed: 16s, ETA: 47074s/home/JJ_Group/wangz/wangzhe21/BEVFormer_wzh/projects/mmdet3d_plugin/core/bbox/coders/nms_free_coder.py:76: UserWarning: To copy construct from a tensor, it is
recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
self.post_center_range = torch.tensor(
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 6020/6019, 3.0 task/s, elapsed: 2031s, ETA: 0s
Formating bboxes of pts_bbox
Start to convert detection format...
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 6019/6019, 8.9 task/s, elapsed: 680s, ETA: 0s
Results writes to val/./work_dirs/bevformer_base_bs_4/Sat_Jul__2_22_57_29_2022/pts_bbox/results_nusc.json
Evaluating bboxes of pts_bbox
Loading NuScenes tables for version v1.0-trainval...
23 category,
8 attribute,
4 visibility,
64386 instance,
12 sensor,
10200 calibrated_sensor,
2631083 ego_pose,
68 log,
850 scene,
34149 sample,
2631083 sample_data,
1166187 sample_annotation,
4 map,
Done loading in 165.923 seconds.
Reverse indexing ...
Done reverse indexing in 25.8 seconds.
Initializing nuScenes detection evaluation
Loaded results from val/./work_dirs/bevformer_base_bs_4/Sat_Jul__2_22_57_29_2022/pts_bbox/results_nusc.json. Found detections for 6019 samples.
Loading annotations for val split from nuScenes version: v1.0-trainval
[00:17<00:00, 338.76it/s]
Loaded ground truth annotations for 6019 samples.
Filtering predictions
=> Original number of boxes: 1368478
=> After distance based filtering: 1368005
=> After LIDAR and RADAR points based filtering: 1368005
=> After bike rack filtering: 1367610
Filtering ground truth annotations
=> Original number of boxes: 187528
=> After distance based filtering: 134565
=> After LIDAR and RADAR points based filtering: 121871
=> After bike rack filtering: 121861
Accumulating metric data...
Calculating metrics...
Saving metrics to: val/./work_dirs/bevformer_base_bs_4/Sat_Jul__2_22_57_29_2022/pts_bbox
mAP: 0.2559
mATE: 0.8938
mASE: 0.3263
mAOE: 0.6558
mAVE: 1.1642
mAAE: 0.3630
NDS: 0.3041
Eval time: 538.7s
Per-class results:
car 0.436 0.671 0.176 0.149 1.922 0.458
truck 0.214 0.828 0.255 0.226 1.109 0.346
bus 0.251 0.914 0.286 0.255 2.448 0.644
trailer 0.064 1.255 0.350 0.982 0.645 0.165
construction_vehicle 0.059 1.085 0.510 1.329 0.124 0.332
pedestrian 0.346 0.866 0.328 0.735 0.765 0.398
motorcycle 0.248 0.879 0.344 0.817 1.602 0.379
bicycle 0.213 0.919 0.319 1.120 0.698 0.181
traffic_cone 0.386 0.728 0.380 nan nan nan
barrier 0.343 0.793 0.314 0.290 nan nan
2022-07-03 18:53:57,216 - mmdet - INFO - Exp name: bevformer_base.py
When I was training bevformer-base with
batch_size =2
, I met the Error:The training process was normal in the first epoch and got the evaluation results as follows.