TencentARC / T2I-Adapter

T2I-Adapter
3.41k stars 200 forks source link

ZeroDivisionError: integer division or modulo by zero #5

Closed imperator-maximus closed 1 year ago

imperator-maximus commented 1 year ago

Hello,

it is not working for me:

(base) D:\sd\T2I-Adapter>python -m torch.distributed.launch --nproc_per_node=1 test_sketch.py --plms --auto_resume --prompt "A car with flying wings" --path_cond examples/sketch/car.png --ckpt models/sd-v1-4.ckpt --type sketch NOTE: Redirects are currently not supported in Windows or MacOs. C:\ProgramData\Miniconda3\lib\site-packages\torch\distributed\launch.py:178: FutureWarning: The module torch.distributed.launch is deprecated and will be removed in future. Use torchrun. Note that --use_env is set by default in torchrun. If your script expects --local_rank argument to be set, please change it to read from os.environ['LOCAL_RANK'] instead. See https://pytorch.org/docs/stable/distributed.html#launch-utility for further instructions warnings.warn( [W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - Die angeforderte Adresse ist in diesem Kontext ung³ltig.). [W C:\cb\pytorch_1000000000000\work\torch\csrc\distributed\c10d\socket.cpp:558] [c10d] The client socket has failed to connect to [kubernetes.docker.internal]:29500 (system error: 10049 - Die angeforderte Adresse ist in diesem Kontext ung³ltig.). Traceback (most recent call last): File "D:\sd\T2I-Adapter\test_sketch.py", line 238, in init_dist(opt.launcher) File "D:\sd\T2I-Adapter\dist_util.py", line 15, in init_dist _init_dist_pytorch(backend, **kwargs) File "D:\sd\T2I-Adapter\dist_util.py", line 25, in _init_dist_pytorch torch.cuda.set_device(rank % num_gpus) ZeroDivisionError: integer division or modulo by zero ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 33136) of binary: C:\ProgramData\Miniconda3\python.exe Traceback (most recent call last): File "C:\ProgramData\Miniconda3\lib\runpy.py", line 197, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\ProgramData\Miniconda3\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\ProgramData\Miniconda3\lib\site-packages\torch\distributed\launch.py", line 193, in main() File "C:\ProgramData\Miniconda3\lib\site-packages\torch\distributed\launch.py", line 189, in main launch(args) File "C:\ProgramData\Miniconda3\lib\site-packages\torch\distributed\launch.py", line 174, in launch run(args) File "C:\ProgramData\Miniconda3\lib\site-packages\torch\distributed\run.py", line 752, in run elastic_launch( File "C:\ProgramData\Miniconda3\lib\site-packages\torch\distributed\launcher\api.py", line 131, in call return launch_agent(self._config, self._entrypoint, list(args)) File "C:\ProgramData\Miniconda3\lib\site-packages\torch\distributed\launcher\api.py", line 245, in launch_agent raise ChildFailedError( torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

test_sketch.py FAILED

I do not have a "C:\cb..." path btw.

MC-E commented 1 year ago

We have fixed this problem by removing distribution during testing.

imperator-maximus commented 1 year ago

thanks this is fixed now