XuyangBai / TransFusion

[PyTorch] Official implementation of CVPR2022 paper "TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers". https://arxiv.org/abs/2203.11496
Apache License 2.0
613 stars 76 forks source link

when I train in nuscenes dataset, ""raise Exception('Error: Invalid box type: %s' % box) Exception: Error: Invalid box type: None. "" was happen #93

Open yu8ri1 opened 1 year ago

yu8ri1 commented 1 year ago

environment

sys.platform: linux Python: 3.6.9 (default, Jan 26 2021, 15:33:00) [GCC 8.4.0] CUDA available: True GPU 0,1,2,3: NVIDIA A100-PCIE-80GB CUDA_HOME: /usr/local/cuda NVCC: Build cuda_11.1.TC455_06.29190527_0 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.8.1+cu111 PyTorch compiling details: PyTorch built with:

TorchVision: 0.9.1+cu111 OpenCV: 4.5.2 MMCV: 1.3.10 MMCV Compiler: GCC 7.3 MMCV CUDA Compiler: 11.1 MMDetection: 2.11.0 MMDetection3D: 0.11.0+

This issue was occured.

2023-04-17 03:23:03,050 - mmdet - INFO - workflow: [('train', 2)], max: 20 epochs 2023-04-17 03:23:04.516097: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 2023-04-17 03:23:04.516235: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 2023-04-17 03:23:04.516248: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly. [W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator()) [W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator()) [W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator()) [W reducer.cpp:1050] Warning: find_unused_parameters=True was specified in DDP constructor, but did not find any unused parameters. This flag results in an extra traversal of the autograd graph every iteration, which can adversely affect performance. If your model indeed never has any unused parameters, consider turning this flag off. Note that this warning may be a false positive your model has flow control causing later iterations to have unused parameters. (function operator()) 2023-04-17 03:25:20,186 - mmdet - INFO - Epoch [1][50/85] lr: 1.115e-04, eta: 1:14:05, time: 2.694, data_time: 0.135, memory: 12569, lossheatmap: 215.9703, layer-1_losscls: 4.6548, layer-1_loss_bbox: 13.0959, matched_ious: 0.0027, loss: 233.7210, grad_norm: 1336.1663 2023-04-17 03:26:50,708 - mmdet - INFO - Saving checkpoint at 1 epochs [ ] 0/81, elapsed: 0s, ETA:/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). self.post_center_range, device=heatmap.device) /root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). self.post_center_range, device=heatmap.device) /root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). self.post_center_range, device=heatmap.device) [>> ] 4/81, 1.8 task/s, elapsed: 2s, ETA: 42s/root/work/TransFusion/mmdet3d/core/bbox/coders/transfusion_bbox_coder.py:99: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). self.post_center_range, device=heatmap.device) [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 84/81, 11.9 task/s, elapsed: 7s, ETA: 0s

Formating bboxes of pts_bbox Start to convert detection format... [>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 81/81, 34.6 task/s, elapsed: 2s, ETA: 0s Results writes to /tmp/tmps7pn7cvi/results/pts_bbox/results_nusc.json Evaluating bboxes of pts_bbox aaaaaaaaaaaaaaaa mini_val /tmp/tmps7pn7cvi/results/pts_bbox Traceback (most recent call last):
File "tools/train.py", line 253, in main() File "tools/train.py", line 249, in main meta=meta) File "/usr/local/lib/python3.6/dist-packages/mmdet/apis/train.py", line 170, in train_detector runner.run(data_loaders, cfg.workflow) File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/epoch_based_runner.py", line 54, in train self.call_hook('after_train_epoch') File "/usr/local/lib/python3.6/dist-packages/mmcv/runner/base_runner.py", line 307, in call_hook getattr(hook, fn_name)(self) File "/usr/local/lib/python3.6/dist-packages/mmdet/core/evaluation/eval_hooks.py", line 279, in after_train_epoch key_score = self.evaluate(runner, results) File "/usr/local/lib/python3.6/dist-packages/mmdet/core/evaluation/eval_hooks.py", line 177, in evaluate results, logger=runner.logger, self.eval_kwargs) File "/root/work/TransFusion/mmdet3d/datasets/nuscenes_dataset.py", line 489, in evaluate ret_dict = self._evaluate_single(result_files[name]) File "/root/work/TransFusion/mmdet3d/datasets/nuscenes_dataset.py", line 400, in _evaluate_single verbose=False) File "/usr/local/lib/python3.6/dist-packages/nuscenes/eval/detection/evaluate.py", line 94, in init self.pred_boxes = filter_eval_boxes(nusc, self.pred_boxes, self.cfg.class_range, verbose=verbose) File "/usr/local/lib/python3.6/dist-packages/nuscenes/eval/common/loaders.py", line 219, in filter_eval_boxes class_field = _get_box_class_field(eval_boxes) File "/usr/local/lib/python3.6/dist-packages/nuscenes/eval/common/loaders.py", line 283, in _get_box_class_field raise Exception('Error: Invalid box type: %s' % box) Exception: Error: Invalid box type: None Killing subprocess 88 Killing subprocess 89 Killing subprocess 90 Killing subprocess 91 Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 340, in main() File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/usr/local/lib/python3.6/dist-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python', '-u', 'tools/train.py', '--local_rank=3', 'configs/transfusion_nusc_voxel_L.py', '--launcher', 'pytorch']' returned non-zero exit status 1.

Karim-Akmal commented 9 months ago

@yu8ri1 where is the steps for train on nuscenes??