fudan-zvg / DeepInteraction

[NeurIPS 2022] DeepInteraction: 3D Object Detection via Modality Interaction
MIT License
221 stars 16 forks source link

KeyError: 'cam_intrinsic' #14

Closed fanxlin closed 1 year ago

fanxlin commented 1 year ago

Thanks for your great work. I'm a newbie and I want to run through this test code on my single GPU2080Ti for learning. But when I was running, I found this error and couldn't proceed, hoping to get your help. Thanks in advance. The following is my environment, error message and nuscenes data directory. I used this nuscenes directory when I was learning BEVFusion, and it can run test successfully. (https://github.com/mit-han-lab/bevfusion)

I set all samples_per_gpu=1, workers_per_gpu=1, When I run the following 2 commands, KeyError appears.

 tools/dist_train.sh projects/configs/nuscenes/Fusion_0075_refactor.py 1
 tools/dist_test.sh projects/configs/nuscenes/Fusion_0075_refactor.py ./pretrained/Fusion_0075_refactor_.pth 1 --eval=bbox

My environment in docker:

TorchVision: 0.10.1+cu111
OpenCV: 4.6.0
MMCV: 1.3.18
MMCV Compiler: GCC 7.3
MMCV CUDA Compiler: 11.1
MMDetection: 2.14.0
MMSegmentation: 0.14.1
MMDetection3D: 0.17.1+3b3147b

my error message:

2022-12-21 02:56:19,919 - mmdet - INFO - workflow: [('train', 1)], max: 6 epochs
2022-12-21 02:56:19,919 - mmdet - INFO - Checkpoints will be saved to /home/fanxl/arepo/DeepInteraction/work_dirs/Fusion_0075_refactor by HardDiskBackend.
2022-12-21 02:56:21.563015: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer.so.6'; dlerror: libnvinfer.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib/python3.7/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-12-21 02:56:21.563107: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libnvinfer_plugin.so.6'; dlerror: libnvinfer_plugin.so.6: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /opt/conda/lib/python3.7/site-packages/cv2/../../lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2022-12-21 02:56:21.563119: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:30] Cannot dlopen some TensorRT libraries. If you would like to use Nvidia GPU with TensorRT, please make sure the missing libraries mentioned above are installed properly.
Traceback (most recent call last):
  File "tools/train.py", line 248, in <module>
    main()
  File "tools/train.py", line 244, in main
    meta=meta)
  File "/home/fanxl/arepo/DeepInteraction/mmdetection3d/mmdet3d/apis/train.py", line 35, in train_model
    meta=meta)
  File "/opt/conda/lib/python3.7/site-packages/mmdet/apis/train.py", line 170, in train_detector
    runner.run(data_loaders, cfg.workflow)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run
    epoch_runner(data_loaders[i], **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 47, in train
    for i, data_batch in enumerate(self.data_loader):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
KeyError: Caught KeyError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/fanxl/arepo/DeepInteraction/mmdetection3d/mmdet3d/datasets/dataset_wrappers.py", line 68, in __getitem__
    return self.dataset[ori_idx]
  File "/home/fanxl/arepo/DeepInteraction/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 357, in __getitem__
    data = self.prepare_train_data(idx)
  File "/home/fanxl/arepo/DeepInteraction/mmdetection3d/mmdet3d/datasets/custom_3d.py", line 150, in prepare_train_data
    input_dict = self.get_data_info(index)
  File "/home/fanxl/arepo/DeepInteraction/mmdetection3d/mmdet3d/datasets/nuscenes_dataset.py", line 232, in get_data_info
    intrinsic = cam_info['cam_intrinsic']
KeyError: 'cam_intrinsic'

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1355) of binary: /opt/conda/bin/python
Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/opt/conda/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 193, in <module>
    main()
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 189, in main
    launch(args)
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launch.py", line 174, in launch
    run(args)
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/run.py", line 692, in run
    )(*cmd_args)
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 116, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/lib/python3.7/site-packages/torch/distributed/launcher/api.py", line 246, in launch_agent
    failures=result.failures,
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
***************************************
         tools/train.py FAILED
=======================================
Root Cause:
[0]:
  time: 2022-12-21_02:56:29
  rank: 0 (local_rank: 0)
  exitcode: 1 (pid: 1355)
  error_file: <N/A>
  msg: "Process failed with exitcode 1"
=======================================
Other Failures:
  <NO_OTHER_FAILURES>
***************************************

This is the data directory of nuscenes:

`-- data
    `-- nuScenes
        |-- maps
        |-- nuScenes_bak -> /dataset/nuScenes_bak/
        |-- nuScenes_test -> /dataset/nuScenes_test
        |-- nuscenes_database -> nuscenes_gt_database
        |-- nuscenes_dbinfos_train.pkl
        |-- nuscenes_gt_database
        |-- nuscenes_infos_test.pkl
        |-- nuscenes_infos_train.pkl
        |-- nuscenes_infos_val.pkl
        |-- samples
        |-- sweeps
        |-- v1.0-test
        `-- v1.0-trainval
Alexander0Yang commented 1 year ago

In the official mmdetection3d, the intrinsic parameters for each camera is recorded as a key cam_intrinsic when generating info files. But in bevfusion, this item is renamed to camera_intrinsics. You can fix it by runing the data preparation again following the official mmdetection3d.

fanxlin commented 1 year ago

thank you very much for your help. I managed to run the test on my poor 2080TI. Thanks again. ··· [ ] 8/6019, 0.5 task/s, elapsed: 18s, ETA: 13341s ···