Haochen-Wang409 / U2PL

[CVPR'22] Semi-Supervised Semantic Segmentation Using Unreliable Pseudo-Labels
Apache License 2.0
426 stars 59 forks source link

ZeroDivisionError: integer division or modulo by zero #131

Closed Mantee0810 closed 1 year ago

Mantee0810 commented 1 year ago

你好,感谢你的工作。 我在U2PL环境中并且也通过cd到达了正确的路径,但是在运行代码时会出现报错:ZeroDivisionError: integer division or modulo by zero。 想知道这可能是因为哪些原因造成的。以下是我完整的报错信息,期待你的回复。

(u2pl) yemengting@sdu-MS-7D31:~/Projects/U2PL-main/experiments/pascal/1464/ours$ sh train.sh 2 28433 Traceback (most recent call last): File "../../../../train_semi.py", line 658, in main() File "../../../../train_semi.py", line 62, in main rank, word_size = setup_distributed(port=args.port) File "/home/yemengting/Projects/U2PL-main/u2pl/utils/dist_helper.py", line 39, in setup_distributed torch.cuda.set_device(rank % num_gpus) ZeroDivisionError: integer division or modulo by zero Traceback (most recent call last): File "../../../../train_semi.py", line 658, in main() File "../../../../train_semi.py", line 62, in main rank, word_size = setup_distributed(port=args.port) File "/home/yemengting/Projects/U2PL-main/u2pl/utils/dist_helper.py", line 39, in setup_distributed torch.cuda.set_device(rank % num_gpus) ZeroDivisionError: integer division or modulo by zero Traceback (most recent call last): File "/home/yemengting/anaconda3/envs/u2pl/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/yemengting/anaconda3/envs/u2pl/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/yemengting/anaconda3/envs/u2pl/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in main() File "/home/yemengting/anaconda3/envs/u2pl/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/yemengting/anaconda3/envs/u2pl/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/yemengting/anaconda3/envs/u2pl/bin/python', '-u', '../../../../train_semi.py', '--local_rank=1', '--config=config.yaml', '--seed', '2', '--port', '28433']' returned non-zero exit status 1.


Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.


Killing subprocess 28454 Killing subprocess 28455

Haochen-Wang409 commented 1 year ago

这似乎是由于没有正确启动DDP,一般的DDP程序您是否能成功运行?

Mantee0810 commented 1 year ago

这似乎是由于没有正确启动DDP,一般的DDP程序您是否能成功运行?

我发现是我的CUDA版本不兼容,现在已经改好啦,非常感谢你的回复,祝你学业顺利!!!