Segmentation fault (core dumped)

YiHuang108 commented 2 months ago

Hi, team, I can run the run_evaluation_debug.sh successfully, however when i tried to launch run_evaluation_multi.sh, it failed with:

(base) Bench2Drive$ bash leaderboard/scripts/run_evaluation_multi.sh
INDEX: 0
PORT: 30000
TM_PORT: 50000
CHECKPOINT_ENDPOINT: mydata/eval_bench2drive220_0.json
GPU_RANK: 0
bash leaderboard/scripts/run_evaluation.sh 30000 50000 True mydata/bench2drive220_0.xml leaderboard/team_code/vad_b2d_agent.py /Bench2DriveZoo/adzoo/vad/configs/VAD/VAD_base_e2e_b2d.py+/Bench2DriveZoo/ckpts/model_4.pth mydata/eval_bench2drive220_0.json ./eval_bench2drive220/ only_traj 0
leaderboard/leaderboard/leaderboard_evaluator.py:21: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
INDEX: 1
PORT: 30150
TM_PORT: 50150
CHECKPOINT_ENDPOINT: mydata/eval_bench2drive220_1.json
GPU_RANK: 1
bash leaderboard/scripts/run_evaluation.sh 30150 50150 True mydata/bench2drive220_1.xml leaderboard/team_code/vad_b2d_agent.py /Bench2DriveZoo/adzoo/vad/configs/VAD/VAD_base_e2e_b2d.py+/Bench2DriveZoo/ckpts/model_4.pth mydata/eval_bench2drive220_1.json ./eval_bench2drive220/ only_traj 1
leaderboard/leaderboard/leaderboard_evaluator.py:21: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
leaderboard/leaderboard/leaderboard_evaluator.py:115: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
  if LooseVersion(dist.version) < LooseVersion('0.9.10'):
/Bench2DriveZoo/mmcv/models/modules/custom_base_transformer_layer.py:77: UserWarning: The arguments `feedforward_channels` in BaseTransformerLayer has been deprecated, now you should set `feedforward_channels` and other FFN related arguments to a dict named `ffn_cfgs`.
  warnings.warn(
/Bench2DriveZoo/mmcv/models/modules/custom_base_transformer_layer.py:77: UserWarning: The arguments `ffn_dropout` in BaseTransformerLayer has been deprecated, now you should set `ffn_drop` and other FFN related arguments to a dict named `ffn_cfgs`.
  warnings.warn(
/Bench2DriveZoo/mmcv/models/modules/custom_base_transformer_layer.py:77: UserWarning: The arguments `ffn_num_fcs` in BaseTransformerLayer has been deprecated, now you should set `num_fcs` and other FFN related arguments to a dict named `ffn_cfgs`.
  warnings.warn(
/Bench2DriveZoo/mmcv/models/bricks/transformer.py:353: UserWarning: The arguments `feedforward_channels` in BaseTransformerLayer has been deprecated, now you should set `feedforward_channels` and other FFN related arguments to a dict named `ffn_cfgs`.
  warnings.warn(
/Bench2DriveZoo/mmcv/models/bricks/transformer.py:353: UserWarning: The arguments `ffn_dropout` in BaseTransformerLayer has been deprecated, now you should set `ffn_drop` and other FFN related arguments to a dict named `ffn_cfgs`.
  warnings.warn(
/Bench2DriveZoo/mmcv/models/bricks/transformer.py:353: UserWarning: The arguments `ffn_num_fcs` in BaseTransformerLayer has been deprecated, now you should set `num_fcs` and other FFN related arguments to a dict named `ffn_cfgs`.
  warnings.warn(
/Bench2DriveZoo/mmcv/models/bricks/transformer.py:96: UserWarning: The arguments `dropout` in MultiheadAttention has been deprecated, now you can separately set `attn_drop`(float), proj_drop(float), and `dropout_layer`(dict)
  warnings.warn('The arguments `dropout` in MultiheadAttention '
/Bench2DriveZoo/mmcv/models/detectors/mvx_two_stage.py:90: UserWarning: DeprecationWarning: pretrained is a deprecated                     key, please consider using init_cfg
  warnings.warn('DeprecationWarning: pretrained is a deprecated \
/home/miniconda3/envs/base/lib/python3.8/site-packages/scipy/optimize/_minpack_py.py:178: RuntimeWarning: The iteration is not making good progress, as measured by the
  improvement from the last ten iterations.
  warnings.warn(msg, RuntimeWarning)
/Bench2DriveZoo/mmcv/models/modules/VAD_transformer.py:248: UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow. Please consider converting the list to a single numpy.ndarray with numpy.array() before converting to a tensor. (Triggered internally at  ../torch/csrc/utils/tensor_new.cpp:201.)
  shift = bev_queries.new_tensor(
/home/miniconda3/envs/base/lib/python3.8/site-packages/torch/functional.py:445: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at  ../aten/src/ATen/native/TensorShape.cpp:2157.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
/Bench2DriveZoo/mmcv/core/bbox/coder/fut_nms_free_coder.py:57: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  bbox_index = indexs // self.num_classes
/Bench2DriveZoo/mmcv/core/bbox/coder/map_nms_free_coder.py:60: UserWarning: __floordiv__ is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
  bbox_index = indexs // self.num_classes
/Bench2DriveZoo/mmcv/core/bbox/coder/fut_nms_free_coder.py:78: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  self.post_center_range = torch.tensor(
/Bench2DriveZoo/mmcv/core/bbox/coder/map_nms_free_coder.py:82: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  self.post_center_range = torch.tensor(
LowLevelFatalError [File:Unknown] [Line: 1214]
GameThread timed out waiting for RenderThread after 60.00 secs
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
Malloc Size=131160 LargeMemoryPoolOffset=196744
Malloc Size=131160 LargeMemoryPoolOffset=327928
Segmentation fault (core dumped)

Looking forward to your reply!

jayyoung0802 commented 2 months ago

Not enough memory or GPU memory, Please reduce the number of tasks per GPU.

YiHuang108 commented 2 months ago

Thank you very much for your help.

YiHuang108 commented 2 months ago

Sorry to reopen the issue, but i can only run the carla close-loop test on my GPU 0.

Any attempt to run on a different GPU has failed, and my search has resulted in carla 0.9.5 not being able to specify a GPU to run on, so I'm very confused now. The code returns like below:

(base) Bench2Drive$ bash leaderboard/scripts/run_evaluation_debug.sh
leaderboard/leaderboard/leaderboard_evaluator.py:21: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
  import pkg_resources
/home/carla/CarlaUE4.sh -RenderOffScreen -nosound -carla-rpc-port=30003 -graphicsadapter=1 None
4.26.2-0+++UE4+Release-4.26 522 0
Disabling core dumps.
LowLevelFatalError [File:Unknown] [Line: 1214]
GameThread timed out waiting for RenderThread after 60.00 secs
Signal 11 caught.
Malloc Size=65538 LargeMemoryPoolOffset=65554
CommonUnixCrashHandler: Signal=11
Malloc Size=131160 LargeMemoryPoolOffset=196744
Malloc Size=131160 LargeMemoryPoolOffset=327928
Engine crash handling finished; re-raising signal 11 for the default handler. Good bye.
Segmentation fault (core dumped)

I hope to get your answer!

jiaxiaosong1002 commented 2 months ago

@YiHuang108 It should be CARLA 0.9.15

YiHuang108 commented 2 months ago

Yes, i use CARLA 0.9.15 following the guideline, but i failed to any GPU except GPU0.

Thinklab-SJTU / Bench2Drive

Segmentation fault (core dumped) #48