Megvii-BaseDetection / OTA

Official implementation of our CVPR2021 paper "OTA: Optimal Transport Assignment for Object Detection" in Pytorch.
Apache License 2.0
242 stars 24 forks source link

how to train with one GPU? #7

Closed SidneyRey closed 2 years ago

SidneyRey commented 2 years ago

I used pods_train --num-gpus 1 , but I got error: Traceback (most recent call last): File "/home//anaconda3/envs/torch1.8/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home//anaconda3/envs/torch1.8/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home//.vscode/extensions/ms-python.python-2021.5.926500501/pythonFiles/lib/python/debugpy/main.py", line 45, in cli.main() File "/home//.vscode/extensions/ms-python.python-2021.5.926500501/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 444, in main run() File "/home//.vscode/extensions/ms-python.python-2021.5.926500501/pythonFiles/lib/python/debugpy/../debugpy/server/cli.py", line 285, in run_file runpy.run_path(target_as_str, run_name=compat.force_str("main")) File "/home//anaconda3/envs/torch1.8/lib/python3.7/runpy.py", line 263, in run_path pkg_name=pkg_name, script_name=fname) File "/home//anaconda3/envs/torch1.8/lib/python3.7/runpy.py", line 96, in _run_module_code mod_name, mod_spec, pkg_name, script_name) File "/home//anaconda3/envs/torch1.8/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/media//D/05_source_code/01_torch/detection/cvpods/tools/train_net.py", line 124, in args=(args,), File "/media//D/05_source_code/01_torch/detection/cvpods/cvpods/engine/launch.py", line 56, in launch main_func(*args) File "/media//D/05_source_code/01_torch/detection/cvpods/tools/train_net.py", line 110, in main runner.train() File "/media//D/05_source_code/01_torch/detection/cvpods/cvpods/engine/runner.py", line 271, in train super().train(self.start_iter, self.start_epoch, self.max_iter) File "/media//D/05_source_code/01_torch/detection/cvpods/cvpods/engine/base_runner.py", line 84, in train self.run_step() File "/media//D/05_source_code/01_torch/detection/cvpods/cvpods/engine/base_runner.py", line 185, in run_step loss_dict = self.model(data) File "/home//anaconda3/envs/torch1.8/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/media//D/05_source_code/01_torch/detection/cvpods/playground/detection/coco/ota.res50.fpn.coco.800size.1x/fcos.py", line 218, in forward gt_ious, box_cls, box_delta, box_iou) File "/media//D/05_source_code/01_torch/detection/cvpods/playground/detection/coco/ota.res50.fpn.coco.800size.1x/fcos.py", line 383, in losses dist.all_reduce(num_foreground) File "/home//anaconda3/envs/torch1.8/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1168, in all_reduce default_pg = _get_default_group() File "/home//anaconda3/envs/torch1.8/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 347, in _get_default_group raise RuntimeError("Default process group has not been initialized, " RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.

how to train with one GPU? Thank you

SidneyRey commented 2 years ago

one GPU don't need norm_sync...