JeffWang987 / OpenOccupancy

[ICCV 2023] OpenOccupancy: A Large Scale Benchmark for Surrounding Semantic Occupancy Perception
Apache License 2.0
568 stars 50 forks source link

Issue: RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. #54

Open YuxiChen-GL opened 1 month ago

YuxiChen-GL commented 1 month ago

Traceback (most recent call last): File "tools/train.py", line 227, in main() File "tools/train.py", line 224, in main meta=meta) File "/root/autodl-tmp/OpenOccupancy/projects/occ_plugin/occupancy/apis/train.py", line 34, in custom_train_model meta=meta) File "/root/autodl-tmp/OpenOccupancy/projects/occ_plugin/occupancy/apis/mmdet_train.py", line 147, in custom_train_detector runner.run(data_loaders, cfg.workflow) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], kwargs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, kwargs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter kwargs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 75, in train_step return self.module.train_step(inputs[0], kwargs[0]) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 237, in train_step losses = self(data) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 98, in new_func return old_func(args, kwargs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmdet3d-0.17.1-py3.7-linux-x86_64.egg/mmdet3d/models/detectors/base.py", line 59, in forward return self.forward_train(kwargs) File "/root/autodl-tmp/OpenOccupancy/projects/occ_plugin/occupancy/detectors/occnet.py", line 203, in forward_train points, img=img_inputs, img_metas=img_metas) File "/root/autodl-tmp/OpenOccupancy/projects/occ_plugin/occupancy/detectors/occnet.py", line 113, in extract_feat img_voxel_feats, depth, img_feats = self.extract_img_feat(img, img_metas) File "/root/autodl-tmp/OpenOccupancy/projects/occ_plugin/occupancy/detectors/occnet.py", line 68, in extract_img_feat img_enc_feats = self.image_encoder(img[0]) File "/root/autodl-tmp/OpenOccupancy/projects/occ_plugin/occupancy/detectors/occnet.py", line 44, in image_encoder backbone_feats = self.img_backbone(imgs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmdet/models/backbones/resnet.py", line 642, in forward x = res_layer(x) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, *kwargs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward input = module(input) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(input, kwargs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmdet/models/backbones/resnet.py", line 297, in forward out = _inner_forward(x) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/mmdet/models/backbones/resnet.py", line 268, in _inner_forward out = self.norm1(out) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/nn/modules/batchnorm.py", line 732, in forward world_size = torch.distributed.get_world_size(process_group) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 845, in get_world_size return _get_group_size(group) File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 306, in _get_group_size default_pg = _get_default_group() File "/root/miniconda3/envs/OpenOccupancy/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 411, in _get_default_group "Default process group has not been initialized, " RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. 有人知道这个问题怎么解决吗?我试过用一个GPU和两个GPU,都报相同的错。然后我想办法把模型里面的SyncBatchNorm改成BatchNorm,会报维度相关的错误。