VDIGPKU / CBNetV2

[TIP 2022] CBNetV2: A Composite Backbone Network Architecture for Object Detection
Apache License 2.0
371 stars 67 forks source link

Single GPU training #37

Open Ali-Abolfathi opened 3 years ago

Ali-Abolfathi commented 3 years ago

hi, thanks for sharing your model, is it possible to train this model on custom dataset with single GPU?, whenever i try to do that, getting this error(im using tools/train.py script): Traceback (most recent call last): File "CBNetV2/tools/train.py", line 188, in <module> main() File "CBNetV2/tools/train.py", line 184, in main meta=meta) File "/content/CBNetV2/mmdet/apis/train.py", line 185, in train_detector runner.run(data_loaders, cfg.workflow) File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 127, in run epoch_runner(data_loaders[i], **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 50, in train self.run_iter(data_batch, train_mode=True, **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/epoch_based_runner.py", line 30, in run_iter **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/parallel/data_parallel.py", line 67, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/content/CBNetV2/mmdet/models/detectors/base.py", line 237, in train_step losses = self(**data) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/runner/fp16_utils.py", line 128, in new_func output = old_func(*new_args, **new_kwargs) File "/content/CBNetV2/mmdet/models/detectors/base.py", line 171, in forward return self.forward_train(img, img_metas, **kwargs) File "/content/CBNetV2/mmdet/models/detectors/two_stage.py", line 266, in forward_train **kwargs) File "/content/CBNetV2/mmdet/models/roi_heads/cascade_roi_head.py", line 248, in forward_train rcnn_train_cfg) File "/content/CBNetV2/mmdet/models/roi_heads/cascade_roi_head.py", line 146, in _bbox_forward_train bbox_results = self._bbox_forward(stage, x, rois) File "/content/CBNetV2/mmdet/models/roi_heads/cascade_roi_head.py", line 136, in _bbox_forward cls_score, bbox_pred = bbox_head(bbox_feats) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/content/CBNetV2/mmdet/models/roi_heads/bbox_heads/convfc_bbox_head.py", line 155, in forward x = conv(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/mmcv/cnn/bricks/conv_module.py", line 201, in forward x = self.norm(x) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/batchnorm.py", line 731, in forward world_size = torch.distributed.get_world_size(process_group) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 748, in get_world_size return _get_group_size(group) File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 274, in _get_group_size default_pg = _get_default_group() File "/usr/local/lib/python3.7/dist-packages/torch/distributed/distributed_c10d.py", line 358, in _get_default_group raise RuntimeError("Default process group has not been initialized, " RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

saidineshpola commented 2 years ago

I removed automatic mixed precision by changing runner to epochbasedrunner then it works fine for me .